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FOREWORD 



This manual provides essential information about the development, interpretation, and correct use of 
personnel classification tests and results. For proper operation of the personnel classification system as a 
whole at all echelons, knowledge of AR 345-5, TM 12-425, 12^26, 12-427, 12-405, 12-406, and 12-407 is 
required. 

The Adjutant General is the War Department operating agency charged with the development, con- 
struction, validation, and standardization of all Army personnel screening tests, except air crew tests. (See 
Sec. II, WD Circular 312, 1943.) Only tests which have been authorized by The Adjutant General are used 
in classifying military personnel. Such tests are described in this manual and changes thereto. Authorized 
tests are also listed in monthly issues of FM 21-6 which prescribes the procedures for requisitioning tests 
and test supplies, and indicates the distributing agencies from which they may be procured. 

Field installations recognizing a need of new tests for special purposes are encouraged to submit their 
problems through appropriate channels to The Adjutant General, Attention: Classification and Replace- 
ment Branch, Personnel Research Section. The use of military personnel to try out new or unauthorized 
tests is not permitted unless the purposes of the experiment and the procedures to be followed have first 
been considered by The Adjutant General and received his approval. 

This manual has been published in loose-leaf form to facilitate changes. Such changes will be supplied 
on a page basis, and will be published as required. As change pages are received they will be inserted in 
their proper places, and the replaced pages destroyed. 

Each page of the manual bears the date of publication in its upper inside corner. Pages which represent 
changes will carry the date and number of the change. 

Paragraphs are numbered consecutively throughout the entire manual. Paragraphs carrying decimal 
suffixes will indicate newly added material; for example, a paragraph numbered 23.1 will represent the first * 
main paragraph following paragraph 23. 

Pages are numbered consecutively throughout the manual. If new pages are added, these will carry 
alphabetical suffixes. For example, if a new page is added between 51 and 52, this page will be numbered 
51 -A. 
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This manual supersedes TM 12-260, 31 December 1942 

CHAPTER I 
INTRODUCTION 



K Classification and Modern War 

Modem warfare has outmoded the old-fashioned 
all-purpose soldier. The Army of today is a “team 
of teams” each of which is made up of men who de- 
pend on one another to do their particular and often 
specialized jobs. Inefficiency in combat is paid for 
in human lives. It is, therefore, of vital importance 
to place in every assignment men who are physi- 
cally, emotionally, and mentally qualified to do 
what is required. Every officer who is responsible 
for deploying troops or for making direct man-from- 
man assignments should be able to find out which 
soldiers are capable of doing each of the many Army 
jobs, and how well they can be expected to do them. 
Whether or not officers are able to learn these things 
readily and accurately depends mainly on two 
factors : 

a. The efficiency of the Army’s system of per- 
sonnel classification. 

b. The officer’s own understanding of classi- 
fication information, and his ability to apply it to 
his practical problems of utilization of men. 

2. Purpose of Classification 

The millions of men and women who make up the 
Army possess different combinations of skills and 
abilities. They represent experience in some eight 
thousand civilian occupations. There are men who 
can build a bridge or drive a nail, men who can plow 
a furrow, or plead a case. There are men who can 
plan and organize, and there are men who have skill 
and strength in their hands. All of them must be 
apportioned among more than five hundred types 
of Army jobs, each of which requires a somewhat 
different combination of skills, aptitudes, and train- 
ing. The task of the classification system is to 
discover the military abilities of these men and their 
aptitudes— or “trainability” — for Army jobs. This 
information must then be analyzed, recorded and 
passed on, with systematic recommendations con- 
cerning assignment. 

3. Scope of This Manual 

Personnel research, the subject of this manual, is the 
scientific work which makes sound classification 



possible. Personnel research has the following main 
missions : 

a. To develop and evaluate tests, rating scales, 
and interview methods which are the instruments 
used for finding out how much of the various skills 
and aptitudes required for Army jobs and Army 
training each soldier may possess. 

b. To develop methods for expressing and re- 
cording each man’s skill and aptitude in the most 
useful way possible. 

c. To inventory the abilities found among 
soldiers as an aid to selection, training, deploy- 
ment and redeployment. 

4. Need for Scientific Methods 

The proper deployment of abilities is as essential 
to success in battle as the tactical disposition of the 
men and materiel. Every efficient division rep- 
resents a balanced assembly of skills, aptitudes, 
and physical characteristics, each present in suffi- 
cient numbers for the task at hand. As attrition 
and casualties thin the ranks, these skills and apti- 
tudes must be replaced at once and in kind. There 
must be an unceasing flow across enormous reaches 
of terrain and water, and this replacement stream 
must always contain the various kinds of men that 
may be required at a particular place and time. 
Both the building of units and their constant re- 
pair by efficient replacement depend upon precise 
knowledge concerning millions of men. Only scientif- 
ic methods can furnish such knowledge. Personal 
judgments take too much time, and vary too much 
among the persons making them. They are too 
difficult to record and pass along in a form that can 
be readily and correctly interpreted. The scientif- 
ic methods of personnel research enable officers 
to know their men on short acquaintance. They 
enable higher- commanders to know in detail the 
forces which are created in a few brief months out 
of an unsorted mass of peaceful civilians and re- 
plenished from a “manpower barrel” which is by 
no means bottomless. 
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5. Ability is an Equation 

'It happens seldom, if ever, that one characteristic 
is the key to success in performing an Army as- 
signment. A man with a “strong” mind and a 
weak back will not be fit for duty where he will have 
to dig fox holes; a man with both a “strong” mind 
and a strong back will not be satisfactory in com- 
bat if he faints at the sight of blood. Ability is 
an equation in w'hich all important characteristics 
are balanced against the job. On one. side of the 
equation is the assignment itself, broken down into 
the demands it makes on the individual. On the 
other side are the characteristics of the ; men being 
considered. These characteristics are of the fol- 
lowing general kinds: 

a. Physical characteristics, such as strength, 
endurance, agility, defects, malformations, and 
other bodily traits which have a bearing on ability 
to perform assignments. 

b. Emotioned characteristics , such as tendencies 
toward depression, split personality, and other 
weaknesses which may make a man crack up under 
the pressures of training or the rigors of combat. 

c. Mental characteristics, such as ability to learn, 
and skill acquired through education, training, and 
experience. 

6. Measure of a Man 

The three kinds of characteristics listed in para- 
graph-6 must all be measured or reliably estimated 
before the Army knows whether a man is soldier 
material. It is necessary to measure some (and 
sometimes all) of them at various stages of a soldier's 
service to determine what special training it is prof- 
itable to give him and what assignments he can be 
expected to perform satisfactorily. Throughout his 
military career, from basic training to combat and 
return (fig. 1), his abilities must be continually 
reviewed and evaluated. As he acquires new 



military skills through training and experience, or 
as the changing needs of the Army demand, his 
assignment is subject to revision. Each change is 
made in the light of fresh information about the 
soldier and up-to-date requirements of the tactical 
situation. 

7. Usefulness of This Manual 

Personnel research provides the procedures (tests, 
rating scales, etc.) used by the Army to measure 
mental characteristics. These procedures are de- 
veloped to meet practical needs and designed for 
use under those condition's usually met with in Army 
installations. They are in most cases special-pur- 
pose instruments. That is, they must be used to 
obtain only the information they were designed to 
discover, and they must be used in the proper 
fashion. Moreover, the results (the scores, ratings, 
etc., which describe the abilities of men and predict 
how well they will do Army jobs) must be inter- 
preted properly if they are to have any practical 
value. This manual explains how measurement 
procedures are developed in the Army, how they 
are to be used, and how to apply the results to 
practical problems of selection and assignment. 
It is directed mainly to those officers and men who 
are principally concerned with the technical phases 
of the classification system — the classification offi- 
cers, personnel consultants, clinical psychologists, 
and the enlisted classification specialists in all 
echelons. But it will also aid all officers who are 
responsible for the conditions under which the 
work of classification is carried on and its products 
utilized. Officers responsible for the disposition 
and deployment of skills and abilities of men, and 
those who make assignments, will find here a key 
to one of the Army’s most valuable assets — the 
classification data concerning every soldier in the 
United States Army. 
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Section I. MEASUREMENT PROBLEM 
8. General 

Classification begins with jobs. The five hundred- 
odd Army assignments, each created by practical 
necessity, determine the kinds of tests to be built 
and the use to be made of their findings. Once a 
definable job is recognized as necessary by the War 
Department, it is added to the table of organ- 
ization of the unit in which the specialty is required. 
Assignments listed in tables of organization are 
analyzed to define clearly and in detail just what a 
man is expected to do, and the resulting job de- 
scriptions are added to TM 12-427. The prep- 
aration and revision of these job descriptions go 
on continuously as new assignments are developed 
to meet the changing needs of the Army. The 
furious rate of growth during recent years is well 
illustrated by the fact that 22 new military jobs 
have been added since 1940 to enable the Army to 
maintain efficient radar installations. The degree 
of proficiency required in each assignment must 
be ascertained and training courses established 
when required to enable men to meet job require- 
ments. Classification is the process of selecting for 
each assignment the men most likely to reach re- 
quired proficiency. There are three general types 
of assignment, each of them requiring different 
techniques applied during the process of classifi- 
cation and selection. (See ch. 7, 8, and 9 for 
specific techniques and points at which they are 
applied.) The types of assignments are as follows: 

а. Assignments for which almost any inductable 
man can reach required proficiency after a short 
period of practice. 

б. Assignments, such as truck driver and auto- 
mobile mechanic, where skill acquired as a civilian 
may be sufficient after brief indoctrination and on- 
the-job training. 

c. Assignments for which weeks or months of 
intensive and special training are required to bring 
men up to requisite efficiency. The range of these 
training courses and their importance to the modern 
Army is illustrated by the facts in table I. 

9. Individual Differences 

The men who must be selected and assigned to 
all the trai n i n g courses and Army occupations are 



a varied lot. Even after screening by the physical 
examination at the induction station, soldiers 
differ widely in health, strength, size, and endurance. 
The Army contains men who can march 50 miles a 
day with full field equipment, and men who could 
scarcely cover a few miles under the same conditions. 



Table I. Number of special training courses for officers and 
enlisted men offered by various arms and services 



Arm or service 


No. of 
courses 
for officers 


No. of 
courses 
i for EM 


Army Air Forces: 






Technical Training Command 


21 


116 


Army Ground Forces: 






Antiaircraft Artillery __ _ 


11 


10 


Armored Force.. . _ 


8 


6 


Cavalry 


6 


9 


Coast Artillery . 


4 


9 


Field Artillery 


10 


11 


Infantry.. 


6 


6 


Parachute School 


4 


4 


Tank Destroyer 


6 


6 


Army Service Forces: 






Adjutant General’s Department 


3 


5 


Army Exchange Service. . 


1 


0 


CorpB of Chaplains 


1 


0 


Chemical Warfare Service _ . 


13 


6 


Corps of Engineers 


11 


20 


Finance Department 


4 


3 


Judge Advocate General’s Depart- 


1 


0 


ment. 








24 


19 


Ordnance Department. 


25 


40 


Provost Marshal General 


5 


2 


Quartermaster Corps 


13 


12 




16 


57 


Special Services 


2 


1 


Transportation Corps 


2 


0 


Totals 


197 


342 



Soldiers differ in such characteristics as resistance 
to extremes of temperature, efficiency at high alti- 
tudes, and ability to see at night. In regard to 
this latter characteristic, which is so important in 
modem warfare, tests under field conditions show 
a wide range in the ability to recognize objects 
(tanks, trucks, howitzers, machine guns, etc.) at 
night. In one such test, a few men could recognize 
the object (a J^-ton truck) at a distance of 90-99 
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differences in physique, stamina, and sensory 
acuity. These psychological characteristics are not 
so directly observable as physical traits. One 
cannot tell by looking at' a man, or by determining 
his physical dimensions, whether he can add or 
spell, drive a nail or repair a carburetor. Nor, what 
is of more importance, can one tell by such direct 
observation whether the man can learn to do these 
things in ;the relatively short time available for 
training. Because these abilities and aptitudes are 
not directly observable they could, without the 
benefit of scientific techniques, be overlooked in 
selecting men for specialist training or assigning 
men to Army jobs. Yet, in every respect, the range 
of skills and aptitudes among Army men is tre- 
mendous. This holds true for general overall 
ability and 'for the hundreds of special skills that 
are required by the Army. Not all men, for ex- 
ample, possess the same capacity for absorbing 
basic military training. Figure 3 illustrates the 
distribution of men in one training unit rated 
according to soldier performance; that is, according 
to their value to the unit as soldiers. Most of the 
men are average soldiers. In fact, this is the real 
meaning of the term “average.” Very few men 
are rated as of no value or of outstanding value; 
most men fall into the middle — -the average — group. 
Men also tend to fall into definable groups when 
ranked according to specialized skills. Figure 4 
shows the performance of the men in a class for 



training radio code operators. After 8 w’eeks of 
training, some of the men were receiving code 
messages at the rate of 18-20 words per minute, 
while others were not able to do better than a rate 
of 2 words per minute. 

b. Men differ enormously not only in the rate 
at which they can acquire new skills, but also in 
the level of skill they can reach with training. 
Contrary to popular belief, individual differences 
are not ironed out by training. Even if endless 
time were available, it is doubtful if the poorest 
performers could be brought up to the level of the 
best. On the contrary, practice is more likely to 
accentuate differences in performance. Those who 
who are more skillful to start with will usually 
improve faster than the less skillful, with the result 
that the range of abilities after training is even 
greater than before. Figure 4 shows that, although 
the whole class did better after 12 weeks of train- 
ing than at the end of 8 weeks, the differences in 
code receiving speed had become even more striking. 

c. The fact that each individual possesses more 
of some skills and aptitudes than of others is equally 
important to the Army. The Army cannot afford 
blundering which results in such folly as the waste 
of a morale-building cook to make a good truck- 
driver or failure to recognize and train a man whose 
endowments might have made him a platoon ser- 
geant capable of saving his whole outfit in a critical 
hour. Such errors must be reduced to a minimum. 



DISTRIBUTION OF PERFORMANCE RATINGS OF A GROUP OF TRAINEES 




RATIN! 



3-4 5-6 

OF NO VALUE 



7 - P 




13 — 14 15-16 17-16 

OF OUTSTANDING VALUE 
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VARIABILITY IN PERFORMANCE OF RADIO CODE OPERATOR 
TRAINEES AFTER 8 AND 12 WEEKS OF TRAINING 
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The Army must choose for training assignments the training. These principles are self-evident. But 

men who will respond most readily to that particular the methods successful in selection are anything 
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but obvious, rule-of-thumb methods. They were 
not, in fact, available until very recently. For, to 
select the best, all men must be measured; and relia- 
ble, accurate, and useful techniques for measuring 
the varying, and often hidden, traits of men are 
almost as modem as radar. The fact that these 
techniques and the effective methods of applying 
them are so new, heightens the importance of a full 
understanding of principles and procedures on the 
part of all responsible personnel. 

Section II. MEASURING SKILLS AND 
APTITUDES 

10. General 

The prohlem of determining the skills and aptitudes 
of men is not new. It has been encountered wher- 
ever some men have been forced to make judg- 
ments about others. Yet it has not always been 
as urgent as it is now. In other circumstances, 
when a man’s ability to perform a job is questioned, 
he can be tried on that job. If he succeeds, he 
must have the necessary ability; if he fails, he can 
be tried out on a different job. Such trial and 
error methods, though straightforward, are feasible 
only when the importance of getting the most 
competent men on the job far outweights the tre- 
mendous waste in time and effort. They are not 
practicable, as a general rule, in classification work 
in the Army. With millions of men to be classi- 
fied, selected, and assigned, and with training time 
and facilities at a premium, it is obviously impossible 
to try each man on the hundreds of Army jobs in 
order to discover the one to which he should be 
assigned. Techniques are required by means of 
which the abilities of men in large numbers can 
be determined in advance of assignment. And it 
is also essential that these findings be dependable, 
yet obtained with a minimum expenditure of time. 

11. Traditional Methods 

Because the psychological characteristics of men 
are not directly observable, it was inevitable that 
attempts should be made to find other signs of 
these characteristics. Through the course of time, 
numerous systems of '‘character analysis’ 7 - have 
evolved. None of them was systematically de- 
veloped nor consciously evaluated. Rather, each 
simply grew as a part of the folklore, and each 
attracted its own circle of devotees. The earliest 
of such systems were cloaked in mystery and 
ritualistic hocus-pocus. They looked not to the 



man himself, but read his character and destiny 
in some outside circumstance — the constellations, 
the entrails of birds, or in the hallucinated visions 
of a crystal gazer. These cults are, of course, 
beyond the realms of reason. In more recent times, 
however, a series of pseudo-scientific methods 
have gained some popular appeal. These are 
usually based on the assumption that psychological 
characteristics can be detected through various 
physical signs or symptoms. 

a. The most widely held of all these methods 
is that based on the assumption that the face is an 
index of the inner man — that traits and skills are 
reflected in the size and arrangement of the features. 
Thus, a high forehead is supposed to indicate intel- 
ligence, and red hair, a “fiery” temperament. Most 
of these beliefs are obviously based on far-fetched 
generalizations or analogies (“men with large ears 
are inquisitive” and “a fox-like face indicates 
trickiness”), and are seen to be patently absurd 
when stated in formal terms. Nevertheless, it is 
a temptation to “size up” individuals as bright or 
dull, crafty, sensitive, or humorous according to 
the shape or size of their facial features. When 
these haphazard judgments are used to determine 
assignment or disposition, the situation may be- 
come serious. Accumulated evidence, obtained 
under carefully controlled scientific conditions, has 
been able to lend no grain of truth to these simple 
beliefs. 

b. A closely allied system of beliefs is that 
known as phrenology, which claims that a man’s 
make up — his skills and talents, personality and 
character, can be appraised by judging the relative 
size and position of irregularities of his cranium. 
Unfortunately for the system, the brain is not a 
mosaic of traits; it does not grow with exercise, 
like a muscle, and its size and shape do not deter- 
mine the size and contours of the external surface 
of the skull. 

c. Graphology, which purports to read the 
nature of men in their handwriting, has a certain 
superficial plausibility. It claims (to choose typi- 
cal examples) that the clear thinker writes clearly, 
the forceful personality with a bold stroke, and the 
ambitious individual on a line slanting upward. 
As in most of these “character analysis” systems, 
this is nothing but analogy — a seductive but 
dangerous way of reasoning. Actually, no one has 
ever been able to demonstrate the validity of 
graphology in a properly controlled experiment. 
In several such experiments, “expert 77 grapholo- 
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gists achieved no better than chance success in 
matching samples of handwriting to the persons 
who wrote them. 

12. Indirect Observation 

Modem science looks to the men themselves, 
rather than to extraneous characteristics. As has 
already been pointed out, it is not practicable to 
observe the men directly on the job' in order to 
discover whether or not they possess the requisite 
abilities. They must be evaluated by the method 
employed throughout all science; that is, by in- 
direct observation which makes it possible to 
predict performance. One method employed by the 
Army is to evaluate the previous occupational and 
educational history of men, since what one has done 
is a clue to what may be expected of him. Inter- 
views and personal history questionnaires therefore 
have a part in classification and assignment pro- 
cedure. (See ch. 6.) They are seriously limited 
by the fact that they are time consuming and lack- 
ing in complete objectivity. The examiner may 
fool himself and the examinee may fool him. More- 
over, questionnaires and interviews lack precision, 
not only because military assignments differ from 
civilian occupations, but also because they cannot 
result in dependable comparisons between man 
and man Dor provide the exact data required for 
prediction. It is necessary to sample the perform- 
ance of soldiers by tests which measure and predict 
in a truly scientific fashion. 

13. Scientific Measurement 

Progress in science goes hand in hand with the 
development of measuring instruments. Primitive 
man could weigh an object by “hefting” it to judge 
how heavy it seemed; the weighing of bits of matter 
too small to be seen awaited the development 
of incredibly delicate instruments. The early 
ph 3 r sician judged the temperature of his patient 
by placing a hand on his fevered brow. Today he 
puts a thermometer in the patient’s mouth and 
notes the height of the column of mercury along a 
scale. Modem measurement is taken for granted, 
but it has several important characteristics that 
must be kept clearly in mind before classification 
testing can be understood. 

o. The measurement is indirect. The phenom- 
enon or characteristic being determined is measured 
by noting its effect on some other phenomenon 
that can be observed. Temperature, for example, 
cannot be seen, nor can it be directly sensed with 



any degree of accuracy. Yet because the physicist 
has established a constant and invariable relation 
between temperature and the expansion of mercury, 
the simple and direct observation of the thermom- 
eter can be used as a measure of temperature. 
Moreover, this measure of temperature is itself 
obtained because it is an indirect indication of some- 
thing else that the physician is really concerned 
with, namely, the patient’s health. 

b. The measurement is objective. That is, 
the result obtained is almost completely independent 
of the person doing the measurement. Experience 
shows that subjective estimates and judgments 
are influenced by many factors that have to do with 
the observer rather than the phenomenon he is 
observing. An object may feel warm or cool 
depending upon the observer’s own temperature 
or upon his expectations, desires, suggestibility, 
prejudices, or a number of other irrelevant factors. 
The thermometer has no personal bias. It will 
yield the same result to all observers providing 
only that they can read it properly. 

c. The measurement is reliable. It does not 
produce one result at one time and something quite 
different a moment later. This consistency is 
necessary in measurement, since without it there 
is no way of knowing whether a change in results 
may be confidently attributed to a real change in 
the phenomenon measured or to a variation in 
the instrument. 

d. The measurement is sensitive. It permits 
the discovery of small variations or fine discrim- 
inations in the characteristic being measured. 
Crude, unaided judgment can easily distinguish 
between extremes of hot and cold. But, few 
men could tell the difference between temperatures 
of 99° and 102° without the aid of a thermometer; 
and it is precisely such small variations that are 
of critical importance in medicine. 

e. The measurement is meaningful , in the sense 
that it can be interpreted correctly and usefully 
by any trained person. Any given reading on a 
good instrument always means the same thing 
because it is higher or lower or more or less than a 
standard reference or “bench mark” on a scale 
which is also standardized. On the centigrade 
thermometer, for example, any obtained reading 
can be interpreted by reference to the temperature 
at which water freezes, and the temperature at which 
it boils. This is possible because the thermometer 
is scaled to these two “reference points” which 
are definite and independent of the measuring 
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instrument. In contrast, such subjective esti- 
mates as “fairly cool” or “very hot” are merely 
personal opinions which may mean quite different 
things to different people and are therefore of little 
use in measurement. 

14. Army Tests as Scientific Tools 

Scientific measurement is applied to the problem 
of determining the skills and aptitudes of men in 
the Army. Though psychologists use paper-and- 
pencil tests for the most part rather than mechani- 
cal devices, the tests are nevertheless accurate and 
useful instruments analogous to thermometers, 
scales, and other aids to physical measurement. 
Those developed and used by the Army in con- 
nection with the selection and assignment of men 
yield results that are objective, reliable, and mean- 
ingful. Army tests are especially constructed to 
serve as the measuring instruments of the classi- 
fication system. As such, they are used to deter- 
mine the skills and aptitudes of Army men in order 
that they can be selected for training courses and 
assignments that are suited to their abilities. These 
tests are used because experience has amply demon- 
strated that they result in better selection; they 
save the Army time, money, and facilities and make 
all echelons more efficient. The reasons for their 
superiority over other techniques are to be found 
in the fact that the Army tests have the same 
characteristics as all scientific measuring devices. 

Section III. ARMY TESTS AS SCIENTIfIC 
TOOLS FOR MEASURING SKILLS 
AND APTITUDES 

15. Indirect Measurement of Traits 

As this chapter has already shown, psychological 
traits cannot be observed directly. But the prod- 
ucts of these traits can be reviewed, examined, 
and evaluated. In other words, the differences 
between men in the amounts of some skill or capac- 
ity they possess can be determined by measuring 
and evaluating their differential performance on 
tests involving those skills. A soldier’s skill in 
mechanics can be determined by observing his 
performance on mechanical tasks or his response 
to questions involving mechanical principles and 
practices. The test poses questions and problems ' 
which, for their correct solution, involve the char- 
acteristics or traits to be measured. The number 
of such questions and problems which the soldier 



can respond to correctly is an indirect measure of 
the amount of the trait he possesses. 

16. Uniformity 

The questions that may be asked about a topic 
are almost unlimited in number, and can be phrased 
in many different ways. It follows that questions 
which are not uniform lead to answers so various 
that they reveal nothing useful in comparing men 
with one another. It is characteristic of informal 
questioning techniques that now some questions 
are asked, now others; that the manner of wording 
or presenting the questions varies; and that the 
interviewer is at one time tolerant and at another 
time critical in his evaluation of answers. The 
test, on the other hand, is composed of a standard 
series of questions selected by careful scientific 
techniques. (See ch. 3.) Thus, all individuals 
to be tested are given identical questions or tasks, 
presented in a uniform prescribed manner. More- 
over, in the typical Army test, the method of indi- 
cating answers is simplified to the point where 
they can be scored by mere counting or by a wholly 
impersonal machine. The whole process is analo- 
gous to that of applying a scale and reading off the 
result. Any reasonably experienced examiner can 
obtain the same measurement by following in- 
structions. 

17. Reliability 

No measuring instrument will produce results 
with perfect consistency. Even a series of succes- 
sive measurements of a wooden plank will contain 
variations. But for satisfactory carpentry, any 
one of these estimates is close enough to be con- 
sidered the “true” length. Should any two meas- 
ures differ by as much as a few inches, however, 
the carpenter’s confidence in his rule is such that 
he would sooner conclude that he was measuring 
different planks than he would suspect the con- 
stancy of his rule. Psychological measurement 
is perhaps not so consistent as this; the plank is 
not so fickle as the men. Yet the estimates of 
the talents of men obtained with the use of Army 
tests are sufficiently close to their “true” measures 
to be employed with confidence for the classi- 
fication purposes for which they are designed. 

18. Significance 

Test measurements are meaningful. The score 
on an Army test is not merely a number that is 
large or small, high or low. It is, first, a number 
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that signifies the amount of a specified attribute 
which a man possesses. According to the way in 
which the test is constructed (see ch. 3), the score 
will indicate an aptitude for a given training course 
or for a given assignment. And secondly, the 
score is a number that signifies how much of the 
required aptitude the soldier possesses in com- 
parison with all other men in the Army. 

19. Economy of Time and Ease of 
Administration 

a. No matter how objective, reliable, and 
significant measurements may be, they are of little 
value to the Army unless they can be obtained 
quickly and easily. An Army is engaged in a 
constant race against time. Decisions cannot be 
delayed for weeks or months, but must often be 
made in a matter of hours. Most Army tests 
are of the paper-and-pencil variety that can be 
given to hundreds of men at one time. Also, most 
of the tests used by the Army can be scored quickly 
and accurately by means of the scoring machine. 

b. The tests developed by the Army are different 
from most other tests in one important respect. 
Because the selection problem is not limited to 
any one place or type of installation, it is impossible 



to procure enough men previously trained as per- 
sonnel experts or examiners to carry out the whole 
program. As a consequence, much of the Army 
testing is of necessity done by men selected and 
trained for that work in the Army. In order to 
simplify the job, and to reduce to a minimum the 
demands made upon the individual judgment of 
the examiner, Army tests are so constructed that 
they do not require the services of highly skilled 
specialists in testing. The directions are uniform 
and explicit and conveyed in non-technical lan- 
guage. In fact, many of these tests are practically 
self-administering. Another advantage gained by 
making tests as nearly self-administering as 
possible is consistency in the way a test is given. 
In order to be fair to all men, the results of every 
test run must be comparable directly with the 
results of every other administration of the same 
test. To make test results comparable they must 
be given under the same conditions. The more 
individual judgment, skill, and invention play a 
part in administration, the greater the chance for 
variation. The Army has therefore left as little 
as possible to be decided by the examiner. Hence, 
the best examiner is the one who follows directions 
most closely and most intelligently. 
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CHAPTER 3 

HOW THE ARMY CONSTRUCTS TESTS 



Section I. PLANNING TEST 

20. Genera! 

Psychological tests are the measuring instruments 
of the classification system. They serve the same 
general purposes as physical tests, enabling the 
Army to weed out the completely unfit and to select 
for various assignments on the basis of character- 
istics which "make men likely to succeed. More- 
over, they predict how well any man will perform 
assignments in training or in the field. Every 
Army test makes it possible to observe the be- 
havior of each man accurately and in exactly the 
same way as others are observed. Every test 
is capable of providing data which is accurate 
enough and reliable enough to result in a higher 
percentage of correct predictions than would be 
possible without it. Every test furnishes a method 
of recording the results in exactly the same way 
each time it is used. The results of every test 
may be interpreted in the same way by everyone 
who makes use of them. But no instrument is 
better than the people who use it. Therefore, 
all personnel in the classification system itself and 
all officers exercising a command function in regard 
to classification, assignment, or redeployment, 
require a thorough understanding of the nature 
of Army tests and the proper interpretation of 
scores. Such understanding can best be conveyed 
by a brief description of the principles and practice 
of test making. 

21. Tests Are Designed to Meet Specific 
Army Needs 

The first step in test making is to study the classi- 
fication problem to be solved. The particular 
need of the Army must be clearly defined both 
in terms of the assignment as a job of work or a 
course of study, and in terms of the number of 
men needed and the Apparent supply. Only when 
the problem is urgent and important is the difficult 
and time-consuming process of test making war- 
ranted. If a rating scale, a questionnaire, or a 
survey of past experience will do just as well, or 
if the number of men and jobs involved is small, 
the construction of a special test is not warranted. 



The test-maker, as will appear later, must con- 
tinually bear in mind the practical purpose to 
which the test will ultimately be put. Moreover, 
study of the problem also brings to light the char- 
acteristic which the test should measure. It is 
necessary to find a characteristic which is possessed 
in high degree by most (if not all) of the men who 
have demonstrated their ability in a particular 
course or assignment. By measuring the amount 
of this trait possessed by untried men, it is possible 
to predict their performance quite accurately. 
In some cases, it is a very simple matter to find a 
trait highly correlated with success — successful 
carpenters possess carpentry skill, and it was no 
great feat to discover that this was the trait to 
measure in order to select the Army’s carpenters. 
In other cases, especially highly complex and 
modern assignments, the particular traits must 
be discovered through experiments made after 
the general purpose of the test is decided upon 
For example, it was decided to measure "cryptog- 
raphy aptitude” in order to predict which men 
would be most likely to pass a course in cryptog- 
raphy satisfactorily. But it took considerable 
research to find which particular traits make up 
cryptography aptitude and choose those most 
highly correlated with successful performance, in 
the course. (See par. 32c.) It is essential to 
select for measurement traits which are actually 
to be found in the men themselves and not in the 
circumstances of a training course or assignment. 
If, for example, men pass or fail a course because 
of the whim of the instructor or the season of the 
year, there is no use in devising a test to discover 
the most likely candidates. 

22. Suiting Test to Its Purpose 

The use to which a test will be put determines 
the kind of test to be developed and the form it 
shall be given. 

a. Achievement tests are used for the purpose 
of finding the men who, without further training, 
will be most likely to succeed in a particular as- 
signment. Achievement tests are, therefore, most 
often used in selecting men for direct assignment 
to some tactical or service organization. Achieve- 
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ment tests predict by discovering and measuring 
what the examinee already knows, or how skillful 
he has become, or both. 

b. Aptitude tests predict which men will be 
most likely to complete a training course in a 
minimum time and in a satisfactory manner. They 
are, therefore, most useful in- selecting from among 
a mixed lot of available soldiers the most promising 
trainees for a particular course. They predict 
success by measuring the degree to which examinees 
possess traits found in men who have been success- 
ful in the particular training regime for which 
selection is being made. 

c. Aptitude may be measured by an achieve- 
ment test, provided that the possession of certain 
knowledge and /or skill indicates aptitude 
for acquiring related knowledge and skill. For 
example, the Army carpentry test can be used to 
choose men to be trained as general utility repair- 
men, because carpentry skill indicates aptitude 
for this specialty. 

d. Army tests are as specific as may be necessary 
to accomplish a particular purpose. For example, 
it is necessary for general classification purposes 
to predict probable success in a wide variety of 
assignments in which general learning ability 
(sometimes defined as “intelligence”) is the most 



important factor. Consequently, the Army General 
Classification Test measures a wide variety of 
knowledge and ability such as most people acquire, 
to greater or lesser degree, in civilian life: ability 
to read, knowledge of word-use, arithmetic, and 
so on. The Mechanical Aptitude Test is more 
limited in range, but still far from being completely 
specific inasmuch as it measures aptitude associated 
with success in any one of a variety of mechanical 
assignments. The Army Radio Code Aptitude 
Test is highly specific, since it predicts the men 
most likely to secceed in a very specialized and 
homogeneous group of assignments, and measures 
only the traits which are correlated with per- 
fonmance in these assignments. The Army thus 
has a wide range of tests, from the highly general 
to the highly specific, each of them adapted to a 
somewhat different classification problem. In con- 
structing a new test, the Army psychologists choose 
general or specific material according to the use 
for which the test is intended. 

23. Form of Test 

Two principal factors determine the form in which 
a test is cast: the purpose to be served and practi- 
cability. A test is a field instrument and must be 
designed to serve efficiently under conditions likely 
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to be found in the field. It must be adaptable to 
Army necessities which limit the time that may be 
allotted to testing. 

a. Verbal Form of Test. A verbal test is 
one in which the examinee is required to talk, write, 
or mark correct responses stated in language. 
The Officer Candidate Test (OCT-1 and -2) is a 
typical example. The type of verbal test most 
commonly used is a paper-and-pencil test admin- 
istered to groups. A verbal test of this kind can 
be administered and scored in a short time and 
with great efficiency and does not require the 
presence of a highly trained specialist. Test 
situations can be made uniform, and highly objec- 
tive standards of scoring may be employed. A 
wide range of ability and knowledge may be sampled 
in a relatively short time. When a large number 
of men is to be tested, or highly trained personnel 
is not available, a verbal test is likely to be the 
most practicable, unless it is clearly unsuited to 
the purpose at hand. It should be borne in mind 
that a verbal test may measure either achieved 
skill or aptitude. A man who can answer certain 
questions concerning a job, or comprehend selected 
written passages about it, can usually do the job or 
learn to do it. 

b. Performance Form of Test. A per- 
formance test is one in which the examinee is re- 
quired to manipulate objects, deal wdth visual 
materials such as pictures and patterns, or make 
practical application of knowledge. It is the 
most efficient form to use when it is necessary to 
observe how the examinee does the job as well as 
his ability to do it. Performance tests involving 
an actual work sample, as for example the making 
of a mortise-and-tenon joint as a test of carpentry 
skill, bear an evident relation to the assignment 
and are both practical and interesting. The 
practicability of performance tests is limited by 
the fact that they are time-consuming and usually 
require highly trained personnel for their adminis- 
tration. They are, however, indispensable in test- 
ing illiterates and men whose knowledge of English 
is limited. Verbal directions may be very simple 
and require no response in language. The Group 
Target Test (GT-1) is an example of the per- 
formance test in which language is minimized in 
order to discover the aptitudes of men deficient 
in the use of English. Nonlanguage tests require 
no speaking, reading, or understanding of language 
on the part of the examinee in connection with 
either directions or response. They are used 



almost exclusively to test men who do not under- 
stand English. 

c. Group Tests. Verbal (pencil-and-paper) 
tests are widely employed by the Army because 
they are best adapted to group test ng. The 
large number of men to be classified and the pres- 
sure of time make it necessary to test in groups 
whenever possible. Performance tests, by their 
very nature, are better adapted to the testing of 
individuals. It is sometimes possible, however, 
to devise a performance test to be given to groups; 
an example is the Group Target Test, employed 
in induction stations. (See ch. 7.) 

d. Individual Tests. Individual tests may 
be of either the verbal or performance form, de- 
pending upon the particular purpose to be achieved, 
or a test may involve both performance and verbal 
responses, as in the case of the Army Individual 
Test (AIT-1). Individual' tests are constructed 
to accomplish purposes for which group tests are 
not suited. These purposes are: 

(1) Screening and classifying men whose lan- 
guage deficiencies or other personal characteristics 
render them unable to demonstrate their abilities 
adequately on a group test. The Individual 
Target Test (IT-1) is an example. 

(2) Testing aptitudes or proficiencies which 
can best be revealed through actual work samples 
or manipulation of performance materials. The 
Distributor and Valve Test (TC-15a) is an example 
of this type of test 

24. Items 

Having decided upon the trait or traits to be 
measured and having determined the form which 
the test is to take, the psychologist’s next step is 
to determine which, among the several possible 
kinds of questions or problems, wall most effectively 
reveal the desired information about each man to 
be examined. Questions and problems are called 
test items. In performance tests, the items are 
usually problems involving cards, blocks, tools, 
materials, etc. In paper-and-pencil tests, the items 
are questions of two types: 

a. Free-Answer Items. These may be sen- 
tences containing a blank space in which the examinee 
is instructed to write a word which makes the 
statement complete and correct, as in the follow- 
ing example: 

The capital of is Boston. 

Free-answer items may also be questions which 
the examinee is required to answer by writing a 
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word or phrase or sentence, as in the following: 

In the Diesel engine, how is the gas mixture 
in the cylinder ignited? 

There are situations in which it is desirable to 
discover not merely whether the soldier can rec- 
ognize the solution to a problem but how he arrives 
at the solution or how he expresses himself in 
giving the answer. Where this is the aim of the 
test, the free-answer item is employed. It should 
be recognized that in giving credit for answers 
to this type of item, much will depend upon the 
personal judgment of the scorer as to what consti- 
tutes a good, an acceptable, or an entirely un- 
satisfactory solution or expression. To avoid these 
difficulties, “restricted answer” items are employed 
when practicable. The commonest form of this 
type is the multiple-choice item. 

b. Multiple-Choice Items. Multiple-choice 
items present the examinee with several (usually 
four or five) answers to a question. His problem 
is to choose and write down the correct answer. 
Examples: 

Boston is the capital of — 

A) Maine 

B) Montana 

C) Massachusetts 

D) Minnesota 

In the Diesel engine, the gas mixture in the 
cylinder is ignited by the — 

A) Spark 

B) Heat generated by compression 

C) Ignition system 

D) Firing order of the cylinders 
Multiple-choice items are preferred for most test- 
ing purposes for the following reasons: Scoring 
is more objective because the right answer is al- 
ready set down and is neither arguable nor subject 
to the varying judgments of testing personnel. 
The examinee has only to recognize the answer, 
and is not burdened by having to search for it in 
his mind and then phrase it in his own way. Be- 
cause the examinee does not have to write, merely 
being required to check the correct answer, he can 
cover many multiple-choice items in a given time. 
As will appear later, it is an advantage to cover a 
considerable number of items. Multiple-choice 
items can be scored by machine, which increases 
accuracy and saves time. (See ch. 4.) 

25. Length 

To include all the items pertinent to a given trait 
would make the test absurdly and inefficiently 



long. The principle which governs the number of 
items selected — and therefore, the length of the 
test — is that there must be enough to show the 
degree to which each examinee possesses the trait, 
and show this in a measurable fashion so that men 
can be compared with one another in terms of the 
trait. Classification testing employs the same 
sampling principles followed in other fields of 
measurement. In grading a carload of wheat, 
for example, it is not practicable to examine the 
whole lot in order to compute the percentage of 
high-quality grain, of chaff, and of foreign materials. 
The examiner instead gathers samples, assays these, 
and assumes that the characteristics of the whole 
carload are the same as those for the samples which 
he tested. But he would be extremely naive if 
he took all his samples from the top of -the car. 
The unscrupulous vendor could easily have filled 
the car with an inferior grade of wheat and placed 
a thin layer of first class stock on top. The sample 
taken from this top layer -would not be representative 
of the w'hole, and the measurement based on this 
sample would be an exceedingly inaccurate de- 
termination of the quality of the entire lot. Aware 
of all the pitfalls of careless sampling, and wishing 
his sample to be representative of the whole car- 
load, the examiner collects a number of smaller 
samples — from the top and bottom of the car, 
from different depths, from each end, and from the 
middle. The more samples he collects, the more 
accurate his grading, since with only a few, chance 
discrepancies loom large in the total. The car may 
contain a concentration of inferior wheat, con- 
stituting a very small fraction of the total amount. 
If the examiner takes only five samples, and happens 
to take one of them from this small concentration, 
the inferior grade will comprise one-fifth of his total 
sample, and measurements based on it will not be 
characteristic of the whole carload. So with test- 
ing, the larger the number of items, the greater the 
accuracy of the test— for two reasons: the larger 
number provides more complete coverage of the 
whole content and at the same time insures that 
any small pockets of ignorance on the part of the 
examinees will not be given disproportionate weight 
in the final result. There is a second consideration, 
a practical one, that enters into the determination 
of length of the test. The Army cannot afford to 
devote an excessive amount of tra ning time to 
the administration of tests. Furthermore, when 
a test becomes overlong, the effects of fatigue and 
boredom are apt to diminish the accuracy of the 
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results. In practice, the test-maker includes as 
many items as are necessary to make the test 
sufficiently accurate for sound classification, but 
keeps it short enough to be practical under field 
conditions and within the physical capacities of 
the average man. 



A GOOD TEST SAMPLES ALL PARTS 
OF THE SUBJECT MATTER FIELD 




FIELD TEST 

Figure 6. 



26. Time Limits 

Such limits are extremely important because they 
help to determine the qualities measured by a set 
of items. A test made up of items which are equal 
in difficulty, measures speed if the time limit is so 
short that no one can finish all the items. A test 
in which the items get harder and harder measures 
power if examinees are given all the time they need 
to complete as many items as they possibly can. 
Most Army tests measure both power and speed , 
s'nce war neither waits upon the slow-but-sure 
nor makes allowances for the lightning blunderer. 
The Army must find out both how well and how 
fast a man can be expected to perform in a given 
assignment. The Officer Candidate Test is a good 
example. It is made up of questions and problems 
which become harder as the examinee forges through 
them. Only the man who is both able and quick 
can complete most of the 70 items in the 45 minutes 
allowed to the test. 

Section II. CONSTRUCTING TEST 

27. General . 

Having “blueprinted” the test by making basic 
decisions as to the trait to be measured, the form 



and ^character of the test, its probable time limits 
and length, the next step is to build a model which 
can be given a thorough trial. The actual build- 
ing consists of writing items and preparing di- 
rections for administering and scoring the test. 

28. Construction of Test Items 

The first step is to assemble a large collection of 
questions or problems which are related to the trait 
to be measured. When necessary, an expert con- 
sultant with a highly specialized knowledge of the 
subject matter is called upon to aid in getting this 
material together. The next step is to frame the 
questions or problems into items; this h ghly techni- 
cal and difficult task is performed by psycholo- 
gists who have made item-writing a specialty. 
Each item is checked to make sure that it is in the 
proper form, clearly phrased, pertinent, and rel- 
evant, and calculated to elicit a response which 
indicates presence or absence of the trait which is 
being measured. The entire collection of items 
is analyzed to make sure that each item contributes 
to adequate coverage of the field. 

29. Preparing Directions 

a. The directions which accompany each set of 
test items constitute a statement of the conditions 
under which the test was “calibrated” or standard- 
ized. Great pains are taken to make these di- 
rections complete and clear. Only by following 
them can the standard conditions— the conditions 
under which the test was standard zed — be re- 
peated. Unless these same conditions prevail, a 
test gives results as unreliable and misleading as 
a thermometer reading taken when the patient has 
a mouth full of ice. 

b. Two sets of directions are prepared for each 
test. Instructions and suggestions to the examiner 
are included in the manual that accompanies the 
test whenever it is administered. They indicate 
the general conditions under which the test should 
be given, list the materials required to give it, and 
the time limits for the parts and for the whole test. 
They also suggest introductory remarks that should 
precede, and set the stage for, the administration 
proper, as well as answers to questions that com- 
monly arise during the testing session. The second 
set of directions are the specific instructions to the 
examinees. These are printed as part of the test 
booklet itself to insure that instructions will be the 
same for every administration of the test. Their 
purpose is to make certain that each individual 
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examinee understands just what he is expected to 
do and how he ia expected to do it. They touch 
upon such details as the advisability of guessing 
when not absolutely sure of the answer, the amount 
of time that will be allowed for the test, the rel- 
ative importance of working for speed as against 
working for accuracy. And they give precise and 
detailed explanations, along with demonstration 
and practice items, of the correct manner of in- 
dicating answers. Since all of these directions 
constitute such a vital part of the test, they are 
prepared by experienced test psychologists and 
subjected to independent check for completeness 
and clarity. 

30. Preparation of Scoring Directions 

o. General. The final step in making a model 
of a new test is to work out the proper technique 
for scoring. The items are first put in the order 
which experts believe will yield the best results. 
The right answers are given a final check to make 
sure that they are clear and matched with the 
questions to which they belong. The position of 
the right answers is adjusted so that they fall in 
truly random positions. That is, the right answers 
are so located that the examinee will not be able to 
“outguess" the test by using or discovering a partic- 
ular pattern of right answers. The truly random 
arrangement makes certain that all the correct 
guesses made by the examinee will not be greater 
than those obtained by pure chance. “Random- 
ization” is accomplished by making a “scoring 
key” or adopting one df the standard scoring keys 
furinshed with International Test Scoring Machines. 
A scoring key indicates the position of the right 
answer on the answer sheet. The correct answer 
is placed in the identical position in the test book- 
let. The incorrect alternatives are so arranged as 
to reduce to a minimum or eliminate entirely any 
clues that the examinee might derive from the 
sequence of the alternatives. 

b. The Scoring Formula. (1) It has been 
found advisable to take further precautions in 
scoring tests were guessing may give a man a 
higher rating than his abilities warrant. The fol- 
lowing example makes clear both the precautionary 
technique itself and the reasons for applying it. 
If an examinee selects one of four alternatives of a 



selecting the correct alternative are one in four. 

In a large number of such guesses, he will be wrong 
three times for every time he guesses right. On a 
100 item test, for example, he will usually obtain 
25 right choices by answering in this fashion. 
Since he got one right answer for every three wrong 
ones, his “true” score can be obtained by the simple 
process of subtracting from the number he got right 
one third of the number wrong, according to the 
formula: 

R — ^ W «= “true” score. 

Applying the formula to this case (25 minus ^ of 
75) will give the examinee’s correct score. One 
further example will illustrate the technique. Let 
it be assumed that two examinees each know the 
answers to 50 items of a test, but that whereas one 
of them stops at this point, the other goes on to 
make pure guesses on the next 20 items and, by 
chance, gets five of them right and fifteen wrong. 
The obtained score of the first examinee will be 
50 and that of the second, 55. Application of the 
scoring formula to both cases, however, will give 
the first man (50 minus 0) 50 and the second man 
(55 minus of 15) also 50. It is important to 

note that the fraction in the formula depends upon 
the number of alternatives to each question* For 
a test composed of items having five answer choices, 
the formula would be right minus one-fourth wrong. 

(2) The use of the scoring formula is based on 
the logic of chance. In practice, however, guesses 
are seldom completely blind. An examinee may 
get an item right by knowing which alternative is 
correct or by knowing that the other three are wrong. 
Likewise, if he knows that two are wrong, he will 
have to guess only between the remaining two 
and will, therefore, stand a better chance of pick- 
ing the correct one. Any error that results from 
the application of the correction formula will always 
be in favor of the examinee who utilizes such 
judicious “guessing.” However, there will be 
other, more cautious examinees who may know just 
as much but -who will never put down an uncertain 
choice if they are to be “penalized for wrong 
answers.” It has been found that the correction 
formula is not sufficiently helpful in estimating 
the probable success of examinees to justify the 
additional work involved, except in certain tests 
of general ability and tests where pure guessing 
is common. It is, therefore, used only with such 



multiple-choice item by pure guess, his chances of 
-*The fraction is always 1 where n ia the number of alternatives to each item. 

n— 1 
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tests as the Army General Classification Test, 
the Army Radio Code Aptitude Test, and a few 
others. 

Section III. TRYING OUT THE TEST 

31. General 

After the planning and construction of the experi- 
mental model of the test, the next step is the “shake- 
down.” The aims of the trial and analysis of the 
new test are, in general, the same as those for any 
other shakedown. A new gun is tested to make 
sure that it will shoot accurately and consistently 
and according to its specifications; A new test 
is given a trial to make certain that it will measure 
specified characteristics which determine the classi- 
fication of soldiers. 

32. Field Trials and Analysis 

In the experimental model of a test, there are many 
more items than will be used in the finished product. 
On the basis of the findings obtained in field studies, 
certain of these items are rejected and the others 
rearranged to make up the test in its final form. 

a. Test Populations. As a first step, the 
experimental form is administered to men who are 
representative of the soldiers who will be classi- 
fied by means of the same test in finished form. 
Tests which will be given at reception centers are 
taken to a typical reception center for trial. A 
test to be used for screening candidates for officer 
training is given to groups of men who meet all 
other requirements for selection, as is the case with 
the entering classes in officer candidate schools. 
It is of the utmost important that the group on 
which a test is tried out be a truly representative 
sample of the group to be classified by means of it. 
If items destined to form part of a test for all re- 
ception center men are administered in the “shake- 
down” stage to a single group arriving from sub- 
marginal communities in the hinterlands, no clear 
or dependable indiction can be gained of the effective- 
ness of these items in measuring the traits of men 
from other, more fortunate sections. If a test is 
to be used with all men at training centers to select 
for specialist training, it must not be given experi- 
mentally to trainees already enrolled in the course. 
A group is said to be representative of its parent 
population if all its various characteristics are the 
same as those of the larger group from which the 
men to be tested are drawn or, in more technical 
terms, if each member of the parent population has 



an equal chance of being included in the sample 
group. These conditions are obviously not pres- 
ent in the cases cited above. In the training 
center example, all trainees would not have the 
same chance of being included in the specialist 
group, since this class has already been selected 
in some fashion, and would, therefore, rate h'gher 
on those traits for which they W'ere selected than 
would the whole training center group. 




Extreme care is exercised to make sure that the 
sample group to which the test is administered 
has all the characteristics of the parent population. 
Whenever practicable, the desired representative- 
ness is achieved by testing a truly random sample. 
Representativeness is achieved in other cases by 
selection controlled with respect to general intellec- 
tual capacity, age, color, education, and any other 
characteristic that might be related to performance 
on the test items. 

b. Difficulty and Discriminative Index of 
Items. The experimental model of the test, then, 
is administered to a group representative of that 
for which it is designed, and the results analyzed 
to determine the difficulty and “discriminative 
index” of each item. Difficulty is not determined 
by the subjective estimate of the test-maker, nor 
the consultant, nor an}; opinion that the item 
“should be easy for anyone claiming to be familiar 
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A DISCRIMINATING ITEM DISTINGUISHES BETWEEN THOSE MEN WITH 
LOW S CO RES AND THOSE WITH HIGH SCORES ON THE TOTAL TEST 

DISCRIMINATING 




NON-DISCRIMINATING 





MEN ANSWERING TEST 
ITEM CORRECTLY 




MEN ANSWERING TEST 
ITEM INCORRECTLY 



Figure 8. 



with the content of the test.” The difficulty of a 
test item is proved by facts; that is, by the pro- 
portion of men in a representative group who actu- 
ally do answer the item correctly. The computation 
of item difficulty, therefore, involves the simple 
but tedious task of counting, for each item, the 
number of examinees who received credit for that 
item during the experimental run of the test. 
Difficulty is expressed in percentage form; a diffi- 
culty of 70, for instance, means that 70 percent of 
the group answered the question correctly. This 
would be a relatively easy item. An item having 
a difficulty of 28, that is, an item answered correctly 
by only 28 percent of the group, is relatively hard. 
The term discriminative index is used by the Army 
to designate the value a particular item has in 
ranking men according to the varying amounts of 
the tested trait which they possess. If, for ex- 
ample, the individuals who answer a given item 



correctly tend to answer most other items correctly, 
while those who fail on the item receive low total 
scores, then the item will be contributing to the 
differentiating function of the test. It will have a 
high discriminative index. An item which, regard- 
less of its difficulty is answered as often by those who 
fail most of the other items as by those who get high 
total scores, does not add to the ability of the test 
to indicate the differences between men. If an 
item is answered correctly by all the examinees, it 
merely adds the same amount to the score of each 
individual without in the least affecting their rel- 
ative ranks. The same is true of an item which 
no one can answer, and true also for an item which 
is answered by the same proportion of low-ranking 
and high-ranking men. 

c. Criteria and Validity. No a priori or 
subjective opinion will tell with enough certainty 
for Army purposes whether an item actually has 
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anything to do with the trait to be measured. 
Evidence in large quantities is required. To get 
that evidence, a criterion against which the item 
itself can be measured must be established. Here 
the psychologist’s problem is more complicated 
than that of the maker of familiar measuring 
instruments. The length of a yard has been agreed 
upon for a long time; therefore, a precise yard 
kept under constant temperature at the Bureau 
_ of Standards is a criterion for all yardsticks. The 
psychologist must measure his test against actual 
performance. An item is known to be valid when 
there is acceptable proof that it has value in pre- 
dicting the particular performance taken as a 
criterion. Success in some assignments requires 
that men have a number of well-defined character- 
istics. For example, men who are proficient in 
cryptography are able to encode and decode at 
satisfactory speed, know various codes, and make 
sound and rapid deductive judgments. All of these 
are criteria of cryptography aptitude. When it 
can be proved that an item helps to predict the 
likelihood that a man will demonstrate any of 
these phases of cryptography aptitude, the item 
is in accord with the criterion and therefore valid 
for a cryptography test. The Army tests items 
against criteria by making a second' tryout of the 
test. 

d. Relation of Criteria to Actual Per- 
formance. The experimental items are admin- 
istered to men who are about to enter a course or 
enter upon an assignment. Subsequent perform- 
ance of these men is carefully observed. All 
significant phases of their performance are re- 
corded so that each test item can be compared with 
the most complete measure of the criterion and 
its validity thus determined. For example, the 
speed in encoding and decoding achieved with the 
various cryptographic devices by men in the cryptog- 
raphy course could be used as a criterion. Only 
items answered correctly on the test by a high 
percentage of men who later achieved satisfactory 
speed, but answered incorrectly by a high per- 
centage of those who failed to achieve such speed, 
would be considered valid in and of themselves. 
Items selected by this criterion would predict 
only this phase of cryptographic aptitude. To 
predict all phases — speed, knowledge of various 
codes, methods and deductive reasoning — a compos- 
ite criterion is used. 



33. Selection of Test Items 

The field studies and analysis of the experimented 
form of the test will have furnished data on the three 
main characteristics of each test item — its diffi- 
culty, its discriminative index, and its validity. 
The next step is to select the items which will 
make up the final form of the test in such manner 
that the finished product will be a carefully cali- 
brated, reliable, and valid instrument. This selec- 
tion is always based on the three characteristics 
of the item mentioned above, but it should be noted 
that these characteristics may be wholly unre- 
lated to each other. An item of high validity may 
be difficult or easy, and it may have high or low 
discriminative value. It is accordingly impossible 
to select on the basis of any one characteristic at 
a time. All three must be taken into account. 

a. Length and Difficulty. The number of 
items at each level of difficulty is determined by 
the purpose of the test. If this purpose is to 
make the most efficient division of the population 
into a high and low group, with reference to the 
trait in question, the difficulties of the items selected 
cluster around the division point. More specif- 
ically, if it is desired to qualify the top 30 percent 
of a population for specialist training or assign- 
ment, then the difficulties of the items selected 
Bhould cluster around 30 percent (items answered 
correctly by 30 percent of the population). If 
however, as is usually the case, it is desired to grade 
the whole population from highest to lowest with 
reference to a trait, rather than merely to divide 
into two groups, the difficulties of the selected 
items should be spread over most of this range. 
In most tests the item difficulties will be fairly 
evenly distributed over the range from 30 percent 
to 70 percent. 

b. Difficulty and Discrimination. In para- 
graph 326, it was suggested that item difficulty 
and discriminative index are interdependent values 
to the extent that a very easy or a very hard item 
cannot have a very high discriminative index. 
While this is true for the whole .range of ablity 
covered by the test, such an item may discriminate 
well over a narrow range. A very difficult item, 
for example, might be failed by all of the lower 
four-fifths of the examinees, but if it is answered 
by as many as one quarter of the highest scoring 
fifth, it will be a very valuable item. For it is 
necessary to select items which discriminate at all 
levels of difficulty. Figure 9 illustrates good and 
bad discrimination at various levels of difficulty. 
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SHOWING GRAPHS OF GOOD (HIGH DISCRIMINATION) AND POOR (LOW DISCRIMINATION)' ITEMS At 

THREE LEVELS OF DIFFICULTY 
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Figure 9. 



In the first chart, for example, are graphs represent- 
ing a good and bad easy item. Both have about 
/ the same difficulty. But the graphs make it clear 
that the good item differentiates rather sharply 
between the poorest group and those only slightly 
better, while the bad item is answered by good 
and poor with approximately equal frequency. 

c. Final Selection. Items are sorted accord- 
ing to their difficulty and their discriminative 
value. Final decision to accept or reject any item 
is made on the basis of its validity.. Where several 
items are approximately the same with respect to 
the other two constants (difficulty and discrimi- 
native index) the ones with the highest validity are 
selected. Of course, items which have no validity 
regardless of their other characteristics, are in- 
variahly rejected. Items which have the same 
discriminative value and the same difficulty dupli- 
cate one another and are therefore so much dead 
weight. In such cases a very slight difference in 
validity is basis for rejecting one or more items. 

34. The Final Form of Test 

Having selected from the collection of items those 
which will serve best the purpose of the test, the 
total reliability and validity of the finished product 



is determined by rescoring the answer sheets for 
these selected items. Except for certain minor 
considerations, this rescoring will give the score 
that each individual would have received had he 
been given the final test rather than the experi- 
mental model. Consequently, by rescormg the 
papers for the first tryout of the items with a 
representative group (see par. 32a), and by per- 
forming the appropriate statistical operations, the 
reliability of the final test can be computed, and, 
if not satisfactory, adjusted by the addition or 
reselection of items. By rescoring the answer 
sheets for the second tryout, with a population for 
which a criterion is available (see par. 32 d), the 
validity of the total test in final form can be 
computed. 

Section IV. ESTABLISHING THE SCALE OF 
MEASUREMENT 

35. Purpose of Standardization 

Upon completion of the item selection, the test is 
a finished product in the sense that it is an accurate 
instrument capable of producing dependable 
measurements. But these measurements will still 
be in terms of “raw” scores, that is, the number 
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of questions answered correctly, or the number 
right minus a fraction of the number wrong. By 
itself, a raw score is seldom of much value to the 
classification officer who has to use it, regardless 
of how accurate and dependable the test may be. 
A raw score does not tell the degree to w r hich a man 
possesses a given skill or aptitude in comparison with 
other men in the Army and is, therefore, no clear 
indication that he will do better or worse than 
others on assignment. A raw score does not tell 
what proportion of Army men stand higher or 
lower in regard to the trait under consideration. 
For each test, it is necessary, therefore, to know 
the scores of all Army men and how they are distrib- 
uted along the range from high to low. Classi- 
fication officers have neither the time nor the 
facilities for collecting this necessary data. Further, 
a time-saving and efficient technique for interpret- 
ing raw scores in terms of this data is required to 
make sound classification practicable in an Army 
of millions. The data concerning performances 
of all Army men is obtained by testing a standard- 
ization population , as described in paragraph 36. 
The device for handy interpretation by the classi- 
fication officer is the Army standard score scale, devel- 
oped by the test-makers to show what raw scores 
mean in terms of Army requirements. (See ch. 5.) 

36. Standardization Population 

It is neither feasible nor efficient to give each new 
test to the whole population of Army men. Suffi- 
ciently dependable information is obtained by care- 
ful sampling methods. Each new test is given 
in its final form to a large group carefully selected 
to represent the whole Army population as ac- 
curately as possible. The size of the group varies 
considerably, depending upon the nature of the 
problem, the availability of groups, and the re- 
quirements of speed and economy. No practical 
advantage is gained by enlarging the sample at a 
high cost in time, energy, and personnel, since 
scientific control of selection and the application 
of statistical techniques produce results that meet 
the requirements of sound classification. The 
representative group to which the final form of the 
test is given is the standardization population. 
The administration of the test to this population 
and the statistical computations which follow are 
known as standardization. 



37. Critical Scores 

The score below which men may not be accepted 
for an assignment or training course (or in the 
case of induction test for the Army itself) is called 
a critical score. Critical scores for several different 
assignments may be set at appropriate points on 
the range of scores for a single test. For example, 
a score of 110 or above on the AGCT meets one of 
the requirements for entry into Officer Candidate 
School, while a score of 100 on the AGCT is sufficient 
to admit a man to candidacy for several of the 
Army Service Schools. The chances of success 
indicated by any obtained Army standard score 
can be readily computed. The critical score is 
set at a point dictated by Army necessities. Thus, 
if it is desired that 80 percent of the men selected 
shall complete a course successfully, or perform 
satisfactorily in a given assignment, the critical 
score could be set such that only men who stand 
a 4 to 1 chance of success will be selected. To 
select so high, however, may also mean that few 
will qualify. In establishing the critical score, it 
is therefore necessary to take into account the 
supply-demand ratio for the particular course or 
assignment in question. If the demand is small in 
relation to the supply, the critical score can be set 
high and there will be less probability of failure 
among those selected as in the foregoing example. 
Where the demand is relatively large, however, 
it will be necessary to lower the critical score in 
order to qualify more men. When this is done, 
some of those who are selected will stand a smaller 
chance of success and consequently a higher per- 
centage of failures must be expected. (See par. 
69.) 

38. Continuing Studies 

Supply-demand ratios change from time to time, 
and even the nature of the course or assignment 
may be materially altered as a result of mechanical 
and technical innovations. The Army’s psycholo- 
gists must make sure that every test which is 
a valid instrument for predicting success in training 
also proves a sound predictor of later job perform- 
ance. When the test is released for field use, it 
remains a concern of the test makers. They make 
follow-up studies from time to time, checking the 
subsequent job performance of men who scored 
high and those who scored low. Only through the 
study of accumulated evidence can the Army be 
sure that the test is doing what it is supposed to do, 
or that an improved edition is necessary. 



23 







TM 12-260 



26 APR 46 



Section V. SUMMARY 

39. The Steps in Test Construction 

These are the steps in test construction and the 
major purposes of each: 

а. Study of classification problem to deter- 
mine need for test, practical considerations in- 
fluencing test structure and trait to be measured. 

б. Design of test or tests to accomplish purpose 
clearly defined in step a. 

(1) Selection and evaluation of criterion. 

(2) Selection of kind of test. 

(3) Selection of form of test items. 

c. Construction of items calculated to reveal 
potential competence of men. 

d. First experimental tryouts to determine 



(1) Discriminative index of each item. 

(2) Difficulty of each item. 

e. Second experimental tryout to determine — 

(1) Predictive value of each item in terms of 
criterion. 

(2) Predictive value of criterion in terms of 
actual assignment. 

/. Selection and sampling of items which have 
proved most effective. 

g. Computation of validity and reliability of 
test as a whole. 

h. Establishment of field form of test for actual 
use. 

i. Standardization process — Construction of 

standard score and conversion tables for interpret- 
ing test. 
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CHAPTER 4 

THE ADMINISTRATION AND SCORING OF TESTS 



Section I. INTRODUCTION 
40. Ceneraf 

The scientist develops tools, the technician puts 
them to use. The instruments devised by the 
scientist, constructed on the basis of extensive 
knowledge and calibrated with great pains, can be 
valuable means of increasing the accuracy of obser- 
vations, insuring the objectivity of judgments, and 
revealing information. But the extent to which 
Scientific tools are useful depends upon the disci- 
pline and the scrupulous care which technicians 
practice in using them. Like the physician’s 
stethescope, which reveals the condition of the 
human heart only to the trained observer, but is 
merely a gadget in the hands of a layman — so 
psychological tests are meaningless abracadabra 
unless employed by men who understand them 
and are willing to take the necessary pains. The 
aim of the present chapter is to clarify the work of 
the technician in giving and scoring classification 
tests so that they will produce results of the greatest 
possible value to the Army. 




THE MANUAL IS PART OF THE TEST 

Figure 10. 



41. Importance of Testing Instructions 

Specific directions for administering and scoring 
are set forth in the manuals which accompany 
each Army test and are as much an integral part 
of every test as the questions themselves. In- 
cluded in specific instructions are the exact wording 
of the directions, the time limits of the test, the 
scoring criteria, and descriptions of the proper 
technique for recording and interpreting results. 
Deviations from any of these can affect the ac- 
curacy of measurement as much as counting wrong 
answers right. Directions must be adhered to 
strictly. 

42. Testing Situation Must Be Standard 

Since it is the function of every test to compare 
each individual with others in the Army population, 
it follows that the conditions under which tests are 
administered and scored must be the same for every 
soldier, regardless of when or where the test was 
given. The scores of men tested in noisy surround- 
ings by slipshod methods are not comparable to 
those of men examined under favorable circum- 
stances. Nor are they, in all probability, accurate 
indications of the real abilities of those men. The 
use of such scores can only result in improper 
classification and misassignment, with attendant 
loss to the Army of potential skills. 

a. Testing conditions and procedures should 
be so standardized that if it were possible to find 
two individuals exactly alike, both would achieve 
the same scores, though tested at different times 
and in different places. Only scores obtained 
under standardized conditions can be relied upon 
to show what may be expected of men. 

b. No valid comparisons can be made be- 
tween the scores of men performing at different 
levels of motivation. The best method of approach- 
ing uniformity is to make sure that all men perform 
to the best of their ability. Standard conditions 
should therefore be optimal conditions. 

c. Tests should be administered and scored in a 
manner identical with that employed in their 
standardization. Standardization (see sec. IV, ch. 

3) involves the administration of each test to a 
population with known characteristics in order to 
obtain norms by means of which each subsequent 
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GOOD TEST ADMINISTRATION REQUIRES 
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Pfjgg&f 



wmm§m 

VMSMM9 



too close 



26 



Figure 11, 




26 APR 46 



TM 12-260 



score may be evaluated and interpreted. In other 
words, test performances in the field are evaluated 
by comparing them with the performance of the 
men in the standardization population. If this 
comparison is to be a valid one, the administration 
and scoring should be identical in the two in- 
stances. Army tests are always standardized in 
the field under conditions which can be duplicated 
in field installations. The principles set forth in 
this chapter make it possible to duplicate the stand- 
ardization conditions. 

Section II. PRINCIPLES AND PROCEDURES 
FOR ADMINISTERING CROUP TESTS 

43. General 

It has already been stated that the procedures for 
administering tests should be such as to call forth 
the best performance of which the individual is 
capable under standard conditions. Each individ- 
ual will tend to do his best if his environment is 
reasonably free from distracting influences, if he 
understands what he is to do, and if he considers 
it worthwhile to do his best. The first of these 
conditions depends upon the physical aspects of 
the testing situation ; the others upon the techniques 
employed by the examiner in controlling the test- 
ing situation. 

44. Physical Surroundings 

All behaviour, including test performances, takes 
place in an environment and is influenced by that 
environment. Since it is patently impossible to 
administer tests in a vacuum, the next best thing 
is to take steps to insure that the environment 
provided is standard for all administrations of the 
test, and that it does not impede or hamper the 
performance of the examinee. While it is rec- 
ognized that ideal testing conditions cannot always 
be achieved with the limited facilities available 
in field installations, attention to the following 
factors should provide conditions that are adequate 
in most cases. 

a. So far as possible, the testing room should 
be quiet. Noise is one of the principal sources of 
distraction from concentration and mental effort. 
Yet, absolute silence is neither necessary nor de- 
sired. The individual who can perform satis- 
factorily only in a soundproof room is going to find 
few places in the Army suited to his peculiarities. 
Noise which continues steadily at a moderate and 
fairly even level of intensity can be considered as 



normal for testing conditions. Such noise would 
include the steady hum of indistinguishable voices 
from another part of the building, the drone of 
machines, or the continuous but muted clatter of 
typewriters. But a sudden shout outside a window, 
a bell, the clatter of unloading a truck, the blare 
of a radio, or the sound of persons passing through 
the room — these are stimuli compelling the atten- 
tion of examinees and interfering with their test 
performance. The noise need not be loud to be 
distracting. The jingle of coins in a proctor’s 
pocket or the cracking of nervous knuckles can 
be extremely upsetting. 

b. Consideration should be given to the acoustics 
of the testing room. The examiner’s voice must 
be clearly audible to all men being tested. The 
public address systems now found in most Army 
testing rooms have solved this problem, but not 
without adding others. Care should be exercised 
in placing loudspeakers and in locating micro- 
phones. The public address system is still some- 
thing of a novelty, and people feel an urge to see 
where the voice is coming from. An invisible 
ghost-voice can cause considerable craning of necks 
and unnecessary distraction; so, if a test must be 
given from an unseen location, a preliminary 
announcement to this effect will dispel distracting 
curiosity. The level of amplification should also be 
controlled. Loud directions booming forth above 
one’s head can be very disconcerting. 

c. The testing room should be well lighted and 
ventilated. There must be sufficient illumination 
on the working surface to prevent eye strain. If. a 
light meter can be obtained, the illumination in 
various parts of the room should be checked. 
Remember, illumination of the working space is 
the important thing. A light meter laid on this 
space should register approximately 6-10 foot- 
candles. Special care should be exercised to avoid 
glare spots and shadows; there is perhaps nothing 
as annoying as having part of the test paper in- 
tensely illuminated with the rest in the shadow 
cast by a pillar, a partition, or the examinee him- 
self. Conditions of temperature, humidity, and 
ventilation are sometimes difficult to control, yet 
every effort must be made to do so. No one can 
perform at his maximal efficiency in a room where 
the air is hot, sticky, or stale. 

d. Among the foremost factors of importance 
are the spatial arrangements of the testing room. 
If conditions permit, the examiner should be pro- 
vided with a raised platform or rostrum in a part 
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of the room where he can see, and be seen, by all 
men being tested. This is especially important 
where test directions call for the presentation of 
charts or other demonstration material. The desks 
or tables for the examinees should be arranged to 
leave aisles for the proctors to use in distributing 
and collecting test materials and in circulating 
about the room during the test. If possible, there 
should also be enough space between rows to allow 
passage. Otherwise, when it becomes necessary 
for a proctor to reach an individual in the middle 
of a row, there is much treading on toes and knock- 
ing elbows en route. The distracting influence of 
such stimuli needs no further comment. The 
working space itself should be flat and smooth 
and free from cracks. Pencils have an irritating 
way of punching through the answer sheet when 
there are knotholes and cracks in the board be- 
neath. If available tables are rough, a tight cover- 
ing of linoleum or pressboard (masonite) will correct 
this. The space allotted to each individual must 
be wide enough to accommodate both a test booklet 
and a separate answer sheet. Chairs with writing 
arms should not be used for testing since the writing 
surface provided is far too narrow. Many installa- 
tions now are equipped with large tables with 
vertical partitions separating the surface into 
booths approximately 30 inches wide and 18 inches 
deep. The partitions insure each person sufficient 
room and prevent the overcrowding of timid souls 
by neighbors with aggressive elbows. They also 
discourage community collaboration. If tables 
like these are not available and cannot be con- 
structed, mess tables make an adequate substitute. 
However, if these mess tables are still being used for 
eating, the hours just before and just after meals 
should be avoided in the testing schedule. The 
noises and odors issuing from the kitchen or the 
clatter of dishes from another part of the mess hall 
fall into the catagory of distracting stimuli. 

e. The temptation to give or to receive aid 
always seems to be present wherever people are 
examined en masse. Aside from the fact that 
cheating is reprehensible from the viewpoint of 
military discipline, its effect on the validity of the 
test score requires that it be prevented. For 
classification purposes, the Army is interested in 
how many correct answers the individual can obtain 
by himself, not how many he can copy from his 
neighbor. The use of partitioned booths (de- 
scribed above) or of alternate seating will help to 
prevent collaboration. In addition, all black- 



boards and charts in the room should be checked 
to insure that no material is left visible to help the 
examinee, and all test booklets which are to be 
re-used should be examined after each session and 
any answers written therein erased. Despite all 
precautions, the proctors will still have to prevent 
cheating during the examination. For this reason, 
proctors should circulate (as quietly as possible) 
rather than remain at a fixed post. The mere 
nearness of the proctor on his rounds is often a suffi- 
cient curb on cheating. 

f. Not all distracting influences are in the ex- 
ternal surroundings. The condition of the in- 
dividual, his physical and mental state, also affect 
his test performance. The man, for example, 
who has just had disturbing news from home, or is 
in physical distress, is in no condition to do his 
best on an examination. In individual cases, these 
factors cannot always be foreseen, but for the group 
as a whole, much can be done by scheduling testing 
sessions at a time of day when fatigue or physical 
or emotional discomfort can be expected to be at a 
minimum. In normal circumstances, the morning 
hours will be the best time to schedule an exami- 
nation and the end of a long day the p oorest . Where 
possible, activities should be controlled so as not 
to interfere with testing schedules. In the re- 
ception center, for example, processing should be 
so regulated that testing does not follow hard 
exercise, long hours of waiting in fine, or immuni- 
zation “shots.” In all cases the test officer, exam- 
iner, and proctors should be alert to the signs of 
genuine distress, and the affected persons should 
be excused until a more propitious occasion. 

45. Testing Session 

The ideal testing session is a smooth-running, 
organized affair. Since its primary purpose is to 
obtain reactions to standard questions under 
standard conditions, the major portion of the time 
is allotted to taking the test itself. All other 
activities such as assembling and seating the men, 
distributing materials, giving preliminary direc- 
tions, and collecting materials — are necessary ad- 
juncts to the main event. Yet it is the manage- 
ment of these details which can make the session 
a smooth-running operat on or chaotic confusion — 
an ordeal to both examiner and examinee. The 
secret of success is control. If the examiner is at 
all times master of the situation, he can keep things 
moving and organized. But if the examiner is 
uncertain and stumbling, if there are unnecessary 
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delays, the group will become restless and irritable. 
Control is best achieved by careful preparation 
and practice in all phases of the process— pre- 
liminary arrangements, test administration, and 
the collection and disposition of materials. The 
discussion that follows will cover general principles 
and specific suggestions for making the testing ses- 
sion an orderly, systematic affair. 

a. Preparation. Preliminary planning for the 
testing session involves the careful selection of the 
testing team, the instruction of all members, and 
practice drill in all required testing procedures. 
The examiner is selected for the quality of his 
speaking voice and for his ability to handle groups 
of men. While no one demands that the examiner 
have the speaking voice of a trained actor, he 
should have one that can be understood easily. 
His accent should either be indigenous to the group 
being tested; or, if the group is from many parts 
of the country, his accent should be “standard 
American” the accent common to the Midwest. 
It is also desirable that the examiner be capable 
of controlling the testing situation— in many cases, 
noncommissioned grade gives the examiner the 
prestige necessary. 



(1) The examiner should make a careful study 
of the manual to make sure that he knows the pur- 
pose of the test, the materials needed to give it, 
the directions to be read, and the problems which 
are likely to arise. He should study those directions 
which are to be read aloud until he can read them 
in a normal manner, without stumbling over un- 
familiar words, losing his place, dawdling, or racing 
through in an unintelligible patter. Familiarity 
with the contents of the test itself is also invaluable. 
It is excellent practice for both the examiner and 
all proctors to take each test in the normal fashion 
before attempting to administer it; this procedure 
should be standard whenever a new test is installed 



or new examinmg personnel are trained. In this 
way, the examiner gains an appreciation of the 
men’s viewpoint on the test and learns how to 
anticipate, and thus be prepared for, the common 
questions which may arise. 

(2) The examiner is responsible for instructing 
the proctors thoroughly.in their specific duties. The 
common practice of snatching any man not at the 
moment occupied, and making him a proctor, then 
and there, should be frowned upon. It is far more 
efficient to designate regular testing teams re- 
sponsible for the administering, proctoring, and 
scoring of tests. Each proctor should be assigned 
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a certain section of the room for which he will be 
responsible. Before the testing period, he should 
check the materials to be used to make sure that 
they are all there in good condition and order, and 
m sufficient quantities. He should know the order 
m which these materials are to be distributed and 
collected, so that, when the time comes, he can 
execute this phase of his assignment with efficiency 
and dispatch. With the administration of the test 
itself, his real job begins. While directions are 
being read and while the test is being taken, he 
should patrol his assigned area. Within this area 
he is responsible for: 

(а) Seeing that each examinee has all the neces- 
sary materials for taking the test, and furnishing 
these, especially pencils, where needed. 

(б) Insuring that each examinee is following 
the directions correctly and understands what 
he is to do and how he is to do it. The proctor 
should be alert to detect incorrect methods of 
marking answers where separate answer sheets are 
employed. 

(c) Seeing that each examinee is doing his own 
work, independent of his neighbors. 

(d) Excusing from the examination any person 
Who is or becomes too ill to continue without 
discomfort. 

(e) Handling all inquiries of the men being 
tested. In no event, however, is he permitted to 
give information pertaining to the content or the 
meaning of the test questions, and he should inform 
the examinee that this is forbidden. 

b. Administering Test. All Army tests should 
be administered strictly in accordance with the 
manuals of directions w’hich are supplied with them. 

A test given without the aid of the manual is no 
more trustworthy and reliable than a rifle without 
sights. What follows here can be considered as 
instructions on how to use the manual in administer- 
ing and timing a test. 

(1) It has been previously said that the value 
of any test score will depend on the extent to which 
the examinee understands just what he is to do and 
the degree to which he considers it worthwhile to 
do his best. The examiner’s primary responsi- 
bility, m fact his main function, is to elicit this 
willingness to work and to provide the proper 
instruction. The first is a matter of the appro- 
priate stage setting and of favorable attitudes. 

The second is handled by oral directions contained 
in the manual. 
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EACH PROCTOR IS RESPONSIBLE FOR 
ONE SECTION OF THE TEST ROOM 



His duties include • • 




Insuring that each man has Insuring that each mon 
etl the necessary materials understands directions 




Maintaining order and Excusing men who are 

preventing cheating too ill to be tested 



Figure IS. 



(2) In setting the stage for the test, the examiner 
should start with a brief informal statement ex- 
plaining the test to be given, how the results will 
be used, and why it is important for each person 
to do his best on it. The aim of these remarks is 
a difficult one: to dispel anxiety and release 
tension, yet at the same time, to stress the necessity 
for maximal effort and output. A careless pres- 
entation may create the impression in the minds 
of the men that the test they are about to take is 
of no consequence and in no way related to their 
future Army careers. Or it may so impress them 
with the seriousness of the situation as to give 
rise to disturbing tensions. On the whole the 
best results will be achieved through a brief, straight- 
forward but nontechnical statement of facts, de- 
livered in a manner that is neither formidable nor 
severe, nor yet so jocular or perfunctory that it 
defeats its own purpose. 

(3) Having set the stage and gained the neces- 
sary cooperation, and having distributed the test 
material, the examiner next informs the men ex- 
actly what they are to do. There is only one way 
to do this— by reading aloud the directions provided 



in the manual. This is the right way, the Army 
way, and the only way. And it means reading aloud 
all the directions that are to be read aloud and no 
more. They should be read in a natural voice, 
in a smooth, coherent fashion. Hence, they must 
have been thoroughly practiced. Notice, however, 
that they should be read, not paraphrased, given 
from notes or memory, or adapted to someone’s 
idea of what is more appropriate for local conditions. 
Every test was “zeroed-in” with its directions; the 
sight-setting must not be altered or the results will 
be. wide of the mark. 

' (4) Nearly all Army tests are given with certain 

time limits which must be strictly observed if 
testing conditions are to be uniform from session to 
session, and from place to place. (See par. 26.) 
These time limits, either for the complete test, or 
for various parts of the test separately, are always 
specified in the test manual. They are exact, not 
approximate, so timing should be handled with care. 
Time limits should be explained carefully to the 
examinee. If a stop watch is available, it should 
be used. If not, any good watch with a second 
hand will serve, if used in the following manner: 
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(а) When giving the signal to start the test, 
note down on paper the hour, minute, and second 
of starting. 

(б) Write below this time the hours, minutes, 
and seconds of working time for the test as specified 
in the manual. 

(c) Add to these two Agues to obtain the exact 
time when the signal to stop work should be given. 
(If the minutes add up to more than 60, of course, 
the sixty minutes would be carried as an additional 
hour and the excess listed as minutes.) Example: 
Starting Time 1451 :00 
Time Limit for Test 45:00 
Stopping Time 1496:00 or 1536:00 
The signal to stop should, therefore, be given 
promptly at 1536. The timing should always be 
done in this way. It is unwise to trust to memory 
to or attempt the necessary computations mentally. 
And it is good practice to have some of the proctors 
check the timing independently. 

c . Collection and Disposition of Test 
Materials. After the signal to stop work has been 
given, the materials should be collected as quickly 
as possible. The period between tests or at the end 
of a session can be one of tremendous confusion 
with everyone talking, comparing notes, reaching 
for coats and hats, and being anxious to leave. 
Under these circumstances the test materials are 
apt to be collected in haphazard fashion, with 
booklets and answer sheets jumbled together. 
And in the confusion booklets have a way of dis- 
appearing. Remember that Army tests are classi- 
fied as restricted and must be accounted for. Re- 
member also that the booklets will be used again 
and the answer sheets have to be scored. Planned 
and orderly collection of materials will pay big 
dividends in the time saved during these future 
operations. The following system achieves a maxi- 
mum of order and control. 

(1) As soon as the stop signal is given, the 
examiner should instruct the men to remain quietly 
in their seats and to follow directions in order to 
exepedite the collection of materials. 

(2) He should then direct them to pass these 
materials to the ends of the rows, specifying which 
end. The materials should be passed separately, 
first answer sheets, then test booklets, then supple- 
mentary materials, such as scratch paper, and 
finally, pencils. 

(3) The men at the ends of the rows should be 
instructed to stack the materials in separate piles, 
making sure that all booklets are closed, with the 



cover sheet outside, and that all answer sheets are 
faced the same way. 

(4) When this is done, the proctors can collect 
the materials and at the same time make a rapid 
count of the numbers turned in. Only after all 
materials have been checked in and accounted for 
should the group be dismissed or the next test 
begun. 

d. Care of Booklets. After each testing 
session, and certainly before the next session, all 
test booklets should be carefully scrutinized for 
answers or marks of any kind. In spite of all 
warnings, some persons will write answers in the 
booklets or use them for scratch paper. If the 
answers or marks can be erased, this should be done. 
If not, or if the booklet is worn or torn, it should 
be destroyed in accordance with paragraph 60, 
AR 380-5. All used scratch paper should be 
destroyed. All tests and testing supplies should 
be kept under lock and key when not in use. 

Section III. DIRECTIONS FOR SCORING 
CROUP TESTS 

46. General 

Scoring a test is fundamentally the procedure by 
which the number of correct answers is counted, 
in many instances this simple process is elaborated. 
Some tests of the multiple-choice answer type, for 
reasons discussed m the previous chapter, are scored 
by counting the number of correct answers, and sub- 
tracting from this figure some fraction of the number 
of wrong answers. If this is to be done, the manual 
will so state, and will indicate what proportion 
of the wrong answers is to be subtracted. The 
statement “right minus one-fourth wrong, ” for 
example, means that one-fourth of the number of 
wrong answers is to be deducted from the number 
of right answers. This statement is known as the 
scoring formula for the test. One must be sure 
to use the proper scoring formula for each test as 
given in the manual for that test and to follow it 
exactly. (See par. 30a.) 

47. Hand Scoring and Machine Scoring 

Some tests must be hand scored, and others, using 
special answer sheets and special pencils, may be 
scored by means of the International Test Scoring 
Machine. Both methods may make use of a scoring 
formula, but with the scoring machine the correction 
can be made automatically. However, some tests 
are beyond the capacities of the machines. Which 
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method is employed will, therefore, depend upon 
the nature of the test and the availability of the 
machine. In this connection, it is important to 
note that all machine-scorable tests may also be 
hand scored. 

48. Scoring Team 

a. Whether scoring is done by hand or by 
machine, there are a number of steps or operations 
that mnst be performed in orderly sequence. Effi- 
cient scoring will require a team, with a different 
individual assigned to each of the successive steps 
in the process. Members of the team will complete 
only the operations to which they are assigned, 
passing the papers along to others who perform 
succeeding operations. With hand-scored tests 
of the type having the answers on the test page it- 
self, it is efficient to have each scored handle a 
single page, counting the number of correct answers 
and writing this at the bottom of the page. Other 
members of the team should be designated to add 
together the scores for each page and convert total 
scored into standard score terms; still others should 
check each of the steps in the process. A convenient 
method of handling machine-scored answer sheets 
requires a team composed of one man to do each of 
the following (the operations are described in later 
paragraphs): 

(1) Scan answer sheets. 

(2) Count attempts. 

(3) Convert raw scores to standard scores. 

(4) Operate the scoring machine. 

(5) Check the conversion. 

(6) Spot check (every 25th answer sheet, for 
example) by hand scoring. 

b. The importance of checking throughout can- 
not be overemphasized. If the test is worth 
enough to be allotted an hour of the examinee’s 
time, it is certainly worth the additional minute or 
two required to insure that the score is an accurate 
one. 

49. Instructions for Hand Scoring 

Hand-scored tests are of two types: expendable 
tests in which the answers are made directly on the 
booklet itself, and nonexpendable tests with which 
a separate answer sheet is used. With tests of the 
first type, scoring keys, or scoring stencils, or merely 
a list of correct answers may be provided. It is 
advisable to devise some form of key or stencil if 
none is provided. With a few Army tests, such as 
the Qualification Test (Q-l or -2) and the Classifi- 



cation Test Rl, the answers will be check marks o 
figures or underlined words scattered over the face 
of the test itself. With tests of this kind, the stencil 
is most efficient. Satisfactory stencils can be made 
with large sheets of clear celluloid or plastic (ex- 
posed X-ray films, for instance) cut away in the 
places where the correct answers will appear when 
the stencil is laid over the test. In this way the 
answers appearing in the windows of the stencil can 
be checked and counted with ease. The trans- 
parency of the stencil adds the advantage that 
answers on other parts of the paper, wrong answers, 
can also be counted without removing the stencil. 
Other expendable tests are arranged so that the 
answers— check marks, underlinings, write-ins, 
etc. — are entered in a column runnihg down the 
right-hand edge of the test page. Scoring such tests 
is facilitated by the use of a strip key — a strip of 
cardboard with the correct answers indicated with 
proper spacing so that when set down alongside the 
column of answers on the test page, the examinee’s 
answers and the correct answers can be directly 
compared. Both stencils and strip keys can be 
readily improvised in the field. 

50. Hand Scoring Separate Answer Sheets 

Many Army tests make use of separate answer 
sheets on which the answers are indicated by a black 
line in the proper space. With tests of this type, 
scoring keys are usually provided. These consist of 
opaque cards that fit over the answer sheet and have 
holes punched in the positions in which the correct 
marks will appear. In using a key of this kind, 
provision should always be made for aligning the 
key with the edges of the paper, or preferably with 
two fixed “landmarks” on opposite comers of the 
answer sheet. Much time in lining up the key will 
be saved thereby, and increased accuracy will be 
gained. The procedures outlined below for scoring 
papers and for providing all necessary checks have 
proved valuable over a period of years and are 
strongly recommended. In using this method, one 
will require, besides the punched-hole scoring key, 
two pencils of different color, red and blue, for 
example. The method is essentially one of count- 
ing right answers, wrong answers, and omissions 
(questions with two or more answers marked are 
counted as omissions). The successive steps in the 
method are: 

a. Look over each answer sheet and draw a red 
pencil mark horizontally through all response posi- 
tions for each item for which the examinee has made 
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USE OF SCORING STENCIL 

A = MARKED ANSWER SHEET 
B = SCORING STENCIL 
C = SCORING STENCIL PLACED 
OVER ANSWER SHEET 
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more than one choice or no choice. Care should be 
taken not to mark in this way reponse positions for 
which the intended choice is clearly indicated even 
though more than one is marked. The sum of 
these red marks will be the number of omissions. 

b. Place the punched-hole key, right side up, 
over the answer sheet and register it. Count the 
number of black marks made by the examinee, which 
appear through the holes of the key, excluding, of 
course, those black marks with a red line running 
through them (multiple answers). The sum of 
these black marks will be the number of right 
answers. 

c. With the key still in place, draw an X with 
the blue pencil on the answer sheet through each 
hole where there appears neither a black mark (right 
answer) nor a horizontal red mark (omission). The 
sum of these blue marks will be the number of 

• wrong answers. 



d. If the scoring formula specifies that the raw 
score is merely the number of right answers, only 
steps a and b need be performed. If some fraction 
of the number of wrong answers is to be deducted 
from the number of right answers, the figures ob- 
tained from steps b and c will be entered m the proper 
formula to obtain the raw score. 

e. A check is provided, in that the number of 
omissions (step a) plus the number of right answers 
(step b) plus the number of wrong answers (step c) 
should equal the total number of items on the test 
(exclusive of practice items). 

/. There are two additional sources of error that 
must be guarded against. If the test does not 
utilize all of the response spaces on the answer sheet, 
care must be exercised to insure that no marks 
beyond the last question are counted. In other 
words, the sum of omissions, right answers, and 
wrong answers must not be more than the number 
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of items on the test. The same precautions must 
be taken to avoid counting any of the practice 
items. 

51. Machine Scoring Answer Sheets 

The widespread use of tests in the Army classifi- 
cation system would have been wellnigh impossible 
without the scoring machine. With this delicate 
and intricate instrument, scoring which would 
require the labor of several men for hours can be 
handled by one man in a much shorter time. The 
principle of the International Test Scoring Machine 
is simple. The graphite deposited by a special lead 
pencil in making a mark on paper will conduct 
an electric current. If two wires from a source of 
power are pressed against such a mark, the circuit 
will be completed. The current will be carried from 
one wire through the mark to the other wire and it 
will cause a deflection of the needle of a galvanom- 
eter connected in series in the current. If there 
are hundreds of these simple circuits, all connected 
to the same galvanometer, all of those which are 
closed by means of pencil marks will add to the 
current flowing through the galvanometer. In 
other words, the amount of the deflection will tell 
how many of the circuits are completed. In a 
sense, therefore, the galvanometer reading is a 
count of the number of pencil marks. If the 
answers to a test are indicated by soft lead pencil 
marks in a specified place on an answer sheet, and 
if this answer sheet is then pressed up against a 
mass of open-end circuits (or electrodes) the dial of 
the galvanometer will register the number of such 
marks. But if a punched-hole scoring key is in- 
serted between the answer sheet and the electrodes, 
the current carried by the “right” pencil marks can 
be routed one w r ay, and the current carried by the 
remaining marks routed another way. Thus the 
dial can be made to register the number of right 
answers, the number of wrong answers, the number 
right plus the number wrong, and finally the num- 
ber right m i n us any portion of the number wrong — 
all at the wall of the operator and the turn of a 
switch. This general account is provided to furnish 
understanding of the machine requisite to accurate 
scoring and to help the operator guard against its 
limitations.* 

a. The scoring machine is an amazingly intri- 
cate and wonderful device, but is not human. While 

*A technical description of the International Test Scoring 
Machine will be found in the Manual of Instruction for the 
International Test Scoring Machine, C. R. 9145 (revised). 



this is something of an advantage, insofar as it 
eliminates some of the human frailties of sub- 
jectivity and inaccuracy, still it limits the machine’s 
powers of discrimination. It cannot, for example, 
tell the difference between an intended answ r er and 
a stray pencil mark, and will count both indis- 
criminantly. Also, it cannot count a pencil mark 
if this is not brought in contact with the electrodes. 
For these reasons a specially printed answer sheet, 
with response spaces properly located, must be em- 
ployed. Since not all pencil leads contain the 
necessary ingredients, a special pencil must be used 
and a good solid mark must be made to indicate 
answers. 

b. Because the scoring machine is, after all, 
a mechanical device, the answer sheets must be 
carefully prepared for scoring if accurate results 
are to be obtained. This procedure is called scan- 
ning, and consists of a thorough check of each of the 
following points. Make sure: 

(1) That each pencil, mark is heavy and black. 
Light marks should be gone over with the special 
pencil. 

(2) That each mark is in the space between the 
pair of dotted lines and entirely fills this space. 

(3) That all stray pencil marks on the paper 
clearly not intended as answers are erased. 

(4) That no response position for any question 
has more than one answer indicated. If such 
multiple answers occur, all marks in the response 
position should he thoroughly erased, and that item 
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considered as an omission. The answer sheet 
should be hand scored if erasures are impractical. 

c. The operator must familiarize himself with 
the method of inserting the keys, of checking and 
adjusting the scoring circuits, and of using the 
formula switches. All of this material is con- 
tained in the Manual of Instruction for the Inter- 
national Test Scoring Machine which accompanies 
each instrument. For utmost accuracy with the 
use of the machine, the following method of scoring 
is strongly recommended. After the keys have 
been inserted, the scoring circuits checked and 
adjusted, and the answer sheets scanned (see b 
above), the successive steps of the method are: 

(1) Count the number of questions attempted 
and enter this figure in one of the score spaces at the 
edge of the answer sheet. It will usually be simpler 
to count the omissions (remembering that multiple 
answers are erased and counted as omissions) and 
subtract this figure from the total number of ques- 
tions. 

(2) Insert the answer sheet into the machine 
and set the formula switch at R + W (right plus 
wrong). The resultant meter reading should be 
the same as the number of questions attempted 
[step (1)]. If these two figures differ by more than 
one point, the paper should be set aside for further 
scanning or for hand scoring. 

(3) If the two values check, set the formula 
switch according to the desired scoring formula. 
The resultant reading is the raw score for the test 
and should be recorded in a designated score space 
at the upper edge of the answer sheet. 

d. Accumulated experience has demonstrated 
that the scoring machine, properly used, will result 
in greater accuracy than the average hand-scoring 
methods. To insure the maintenance of this high 
level of accuracy, special attention is directed to 
the following precautions: 

(1) At the beginning of each day's scoring, and 
after each 100 answer sheets have been scored, 
check the scoring circuits according to the methods 
outlined in the Manual of Instruction for the In- 
ternational Test Scoring Machine. If the machine 
is not working properly, get in touch with the 
nearest IBM representative, and use hand-scoring 
methods until it has been satisfactorily adjusted. 

(2) Rescore by hand-scoring methods a random 
sample of answer sheets each day as an additional 
check on scoring procedure. 

(3) On damp days use the heating unit to dry 
the papers preparatory to scoring. Moisture also 
is a conductor of electricity. 



Section IV. DIRECTIONS FOR 
ADMINISTERING INDIVIDUAL TESTS 

52. General 

For the most part, the tests employed by the Army 
classification system are of the paper-and-pencil 
type administered to groups of examinees at one 
time. In certain special circumstances, proper 
classification and disposition of the soldier will 
demand the administration of an individual test. 
The expenditure of time and effort is greatly in- 
creased with the employment of such instruments, 
and their use must be limited. In general, the in- 
dividual type of test is recommended for cases in 
the following categories: 

a. Where the paper-and-pencil type test is in- 
appropriate because the subject is lacking in the 
educational skills of reading and writing. 

b. Where more personal contact is needed to 
insure that the examinee is at ease, is properly 
motivated and encouraged, and knows just what he 
is supposed to do. 

c. Where it is desired not only to determine the 
individual’s over-all score, but also to give the 
examiner the opportunity to observe him at work 
and to estimate his specific strengths and weak- 
nesses. 

53. Individual Testing Session 

The individual testing session is a more personal 
and somewhat less formal affair than the group 
testing session. This does not make it easier to 
manage; on the contrary, the individual test admin- 
istrator needs much more than average training 
and experience to achieve results which can be 
accepted with any confidence. His task is a difficult 
one. He must administer a rigidly controlled and 
carefully standardized set of problems under condi- 
tions precisely as specified, all the while creating an 
impression of friendly informality. Men assigned 
to this job must be selected with care. The ideal 
examiner is a man with a knowledge of the principles 
of psychological measurement and an appreciation 
of the needs for exactness and precision. He is 
personable and friendly and an easy conversa- 
tionalist. He is patient and tolerant, never given 
to a show of arrogance, flippancy, or sarcasm, no 
matter how absurd the responses of the subject 
might be. And he is a monument of tactfulness. 

a. Individual tests, because they are essentially 
personal interviews, should be given in an at- 
mosphere of privacy. Special rooms are recom- 
mended, but separate booths divided by partitions 
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Figure 16. 

will serve where facilities are limited. Examiner 
and subject should be provided with straighLback 
chairs of a comfortable height, facing each other 
across a table or desk. This table or other working 
surafce should be large enough to accommodate all 
of the test materials and provided room for the 
examiner to jot down responses on the record sheet. 
The common field table is of the proper dimensions 
for most individual testing. 

b. Before undertaking the administration of an 
individual test, the examiner should make a careful 
study of the manual of directions and of the test 
materials. The exact wording to be used in pre- 
senting the materials will be specified, and should 
be reshearsed until it can be read in a normal con- 
versational manner. The examiner should also 
practice the things he is to do — the placement of 
the materials, movements, pointings, demonstra- 
tions, etc. — until these are smoothly coordinated 
with the verbal directions. Finally, he should give 
practice administrations with “guinea pig” subjects, 
under the supervision of a qualified examiner until 
he can maintain the subject's interest and confi- 
dence, select and use necessaiy materials and in- 
structions without fumbling, and make of the whole 
procedure a smooth and effective performance. • 

c. The administration of an individual test 
begins as soon as the examinee enters the room. 



The first step is to get him into the proper mental 
condition for taking the test; to remove fear and 
tension which may conceal qualities valuable to the 
Army. The man reporting for the test is quite apt 
to be afraid, discouraged, misinformed, or truculent, 
and in no condition to perform in a fashion that can 
be considered a trustworthy indication of his true 
ability. Here the care exercised in sleeting ex- 
aminers will begin to pay dividends. The skillful 
examiner will greet the subject in an affable manner, 
ask him questions about himself, about his work, 
listen to . his complaints, and give every indication 
of being truly interested. He will often have to call 
upon all his skill and patience to carry this off with- 
out creating an air of forced and stilted play acting. 

d. The transition from this informal chat to the 
presentation of the test materials should be gradual 
and natural. The test manual will probably con- 
tain suggestions for bridging this gap, or such state- 
ments as “I have some problems here I would like 
you to try,” or “Let’s see what you can do with these 
questions,” will serve the purpose. From this point 
on, the presentation of the problems, the questions, 
and all directions to the subject must follow the 
exact wording of the manual. Moreover, any per- 
formance materials, such as blocks or pictures, must 
be placed on the table or exposed precisely as speci- 
fied. And the different parts of the test must be 
given in order, without skipping around. It can- 
not be overstressed that any departure from the 
manner of administration in which the test was con- 
structed and standardized will make the test score 
unreliable. 

e. The examiner should speak distinctly and 
slowly while administering the test so that the ex- 
aminee may hear and understand, for the examiner 
may not repeat any questions (unless some unex- 
pected disturbance has prevented his being heard 
the first time). He must guard against gestures, 
words, or inflections of the voice that may suggest 
an answer. Throughout the course of the examina- 
tion, the examiner will have to stimulate the sub- 
ject to do his best. He will have to make appropri- 
ate remarks of approval or praise after each suc- 
cess, and he wall have to console and encourage him 
when he fails. Suggestions for such appropriate 
remarks will usually be included in the manual for 
administering the test. 

54. Timing 

With the individual test, the problem of timing is 
usually somewhat more difficult than with group 
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tests. This is so because the problems and ques- 
tions which compose the individual type of test 
often have separate time limits, and these limits 
are often specified in terms of seconds rather than 
large intervals. Furthermore, many of the items 
of an individual test are scored in terms of the time 
required to arrive at the correct solution. This 
means that the examiner will have to look at his 
watch frequently. Yet, because of the close per- 
sonal nature of the situation, the examiner cannot 
be too obviously engrossed in the time problem 
without creating a disturbing and distracting ten- 
sion in the examinee. Everyone has experienced 
the feeling of nervous strain when working against 
time and the maddening tendency of fingers to be- 
come all thumbs as the seconds tick off. Some of 
this tension is natural, of course, when one is told 
to work as rapidly as he can; but it is tremendously 
heightened if the examiner is a nervous clock watch- 
er. So, the timing should be done unobtrusively 
and with the appearance of casualness. This does 
not mean, however, that it can be slipshod. It is 
of utmost importance that the timing be precise 
and that the exact limits specified in the manual be 
observed. Where tests are scored in terms of time, 
an error of a few seconds may account for a differ- 
ence of several points of score. It is essential that 
the examiner should be thoroughly practiced in tim- 
ing. If possible, he should use a stop watch, be- 
cause this will always start at zero, and because the 
timing can be done with one hand, leaving the at- 
tention of the examiner focused on the test itself. 
If a stop watch is unobtainable, an ordinary watch 
will be used. It should be of the type equipped 
with a large sweep second hand for ease of reading, 
and the examiner should always give the starting 
signal for any item when the second hand is at zero. 

Section V. SCORING INDIVIDUAL TESTS 

55. General 

In addition to administering and timing the parts 
of the test, the examiner must record and score the 
subject’s responses. In doing this, he will be con- 
cerned with a record or score sheet and with lists 
of scoring criteria and time credits contained in the 
manual for the test. He should study these care- 
fully and use them precisely as directed. 

56. Score Sheet 

The score sheet for any particu'ar individual test 
is usually a single page upon which are arranged the 



spaces for recording or indicating answers and 
scores. On this page will be separate areas for 
each of the subparts composing the test, and these 
in turn will be lined to accommodate the answers 
to individual items. The record of the response to 
a particular item may be a mark (plus or minus), a 
time interval, or a word or phrase. The score sheet 
will usually contain all scoring credits and weights 
to facilitate scoring. For instance, if each correct ( 
response is to be allowed a certain number of cred- 
its, these can be read off directly, entered in another 
column, and summed to yield the total score for 
that subpart. Finally, the record sheet will usually 
provide a section for the recapitulation of part 
scores, for any multiplying factors employed, and 
for entering subscores, total scores, and converted 
scores. The examiner jots down the marks, wrnrds, 
phrases, or times in the appropriate rows and col- 
umns as the testing session progresses, and, where 
possible, he allots to each response the credit it 
has earned at the time it is given. At the end of 
the examination, he can verify any doubtful an- 
swers by reference to the scoring criteria in the 
manual, add the credits, and compute the various 
scores. In doing all of this, he should avoid direct- 
ing the examinee’s attention to the machinery of 
scoring, or to any particular score credit given. 

It is advisable to hold the manual with the left 
hand in such a way that it shields the score sheet 
from the subject’s view. 

57. Scoring Criteria 

The scoring criteria and tables of time credits will 
be included in the manual for the test. The nature 
of the individual testing session makes the subject’s 
responses less circumscribed than responses to the 
paper-and-pencil type test. The individual test 
examinee will seldom be required, for example, to 
choose one of four possible answers or to respond 
with a single word or figure. More often he will 
be asked to explain or discuss in his own words or 
to offer his solution to a problem. The fact that 
the response is freer should not, however, imply that 
it is less objective. In order to provide the desired 
objectivity and uniformity in scoring, it is necessary 
to draw up certain criteria to be used as standards. 
These criteria will be in the nature of lists of all 
possible acceptable answers or of examples of the 
type of answer for which full or part credit will be 
allowed. They will also sometimes list nonaccept- 
able answers for which no credit is to be given. 
Faithful, even slavish, adherence to these criteria 
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Figure 17. Example of an individual test score sheet. 



is a primary* requisite for uniform and accurate 
scoring. The examiner should adopt the strict 
rule that he will never give credit for a response not 
indicated as acceptable no matter how satisfactory 
or plausible it may seem to him. If the response 
in question is not among those listed, one may safe- 
ly assume that it has been omitted — not through 
oversight— but because it is not a wholly accurate 
response or one typical of the best solutions to the 
problem. Again it will be helpful to remember 
that tests are standardized prior to release accord- 
ing to the directions and criteria contained in the 
manual, and any deviation from these can only 
result m a loss of uniformity. Occasionally, it is 
true, a rare but correct answer will crop up. But 
before this can be credited as correct, it should be 
analyzed and included, officially, in a revision of the 
cntena. A common weakness of examiners is the 
tendency to allot credit for what they think the 



subjects mean to say rather than for what they do 
say. The urge to do this may spring from a desire 
to help out, or may result from a feeling that the 
examinee knows more than his answer would im- 
ply. In either event the tendency should be avoid- 
ed. If examiners were unerring judges of human 
capacities, testing would not be necessary, but ex- 
perience has abundantly demonstrated that tests 
are far more reliable than individual ‘‘hunches.’’ 

Section VI. OBTAINING STANDARD 
SCORES 

58. General 

d he number of correct answers to a test, or the num- 
ber of right answers less some portion of the num- 
ber wrong, or in the case of an individual test, the 
sum of all the individual credits allowed is called 
the raw score. For reasons that will be made clear 



39 






TM 12-260 



26 APR 46 



in the next chapter, this raw score is seldom the 
one which is reported or made a matter of record. 
With most Army tests a uniform scale of standard 
scores is employed, and the last step in the exam 
ining and scoring procedure involves the conversion 
of the obtained raw score into standard score form. 
In practice this is a simple matter of using the prop- 
er conversion table which will be found in the man- 
ual for the test. These tables are drawn up in two 
columns, the first for all possible raw scores that 
can be obtained on the test, arranged in order of 
descending magnitude, and the second for the cor- 
responding standard scores. The total range of 
scores is generally also divided into five Army grade 
intervals. Having arrived at a certain raw score 
for the test, the examiner or scorer can locate this 
in the first column of the conversion table and read 
across to the corresponding standard score. Then, 



by reference to the grade divisions, he can determine 
the Army grade into which it falls. As in all purely 
clerical activities, however, errors frequently occur. 
Therefore, the conversion should always be given 
an independent check. 

59. Summary 

The standard score and Army grade are the end 
results of the administering and scoring process. 
If the principles and procedures set forth in this 
chapter are faithfully followed, scores will be ob- 
tained that can be accepted with confidence as true 
measures of the examinees' skills and capacities. 
The manner in which these results are to be inter- 
preted and the uses to which they are put in the 
general system of classification in the Army these 
will be the subjects of later chapters. 
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CHAPTER 5 

THE MEANING AND INTERPRETATION OF TEST SCORES 
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Section I. THE ARMY STANDARD 
SCORE SCALE 

60. Review and Orientation 

Scores on most Army tests are reported and re- 
corded in the form of standard scores. As has been 
stated (par. 35) this is done because such standard 
scores make analysis of test data more efficient; 
they reveal in a way that is clear and easy to grasp 
the information required by personnel concerned 
with classification. Each particular Army stand- 
ard score is, in effect, a summary not only of the 
data concerning the performance of a man on a 
particular test, but also a summary of the signifi- 
cance of his performance. A man’s test perform- 
ance has the following general kinds of significance 
to the Army: it tells how the man compares with 
others as to the amount of a particular aptitude or 
skill he possesses; and, coupled with a statement of 
the test’s validity, it tells approximately what per- 
formance on a particular assignment may be ex- 
pected of a man, and how the performance expected 
of him compares with that expected of others. 

61. Development of Standard Score Scale 

The most practicable methods of summarizing test 
data and their significance are statistical, and the 
standard score scale is, therefore, a statistical de- 
velopment. For all Ajuny tests, the necessary 
data are computed by expert technicians and sup- 
plied in the form of convenient conversion tables. 
However, the concept of the standard score will be 
interpreted more correctly and used more effective- 
ly if all classification personnel gain a general under- 
standing of the statistical techniques involved. 

a. Other Types of Scores. A brief analysis 
of the limitations of other scoring systems will make 
clear many of the advantages of the standard score 
scale. 

(1) The familiar system which attaches arbi- 
trary values to percentages, as for example 60 per- 
cent meaning “poor,” 80 percent meaning “good,” 
90 percent meaning “excellent,” are not useful to 
the Army. Such scores, when achieved on differ- • 
ent tests, cannot be compared. Ninety percent on 
an arithmetic test may represent an entirely dif- 
ferent degree of ability than 90 percent on a lan- 



guage test. Arbitrary percentage scores, therefore, 
do not indicate whether a man may be better at 
mathematics or languages. Moreover, arbitrary 
values given to percentages often depend upon the 
judgment of the examiner. 

(2) Test results in the form of raw 7 scores lack 
meaning. The number of questions w 7 hich an ex- 
aniinee answers correctly is partly a function of 
the number of items in the test. And since it is 
impossible, for practical reasons, to make all tests 
uniform in length, or to attach uniform weight to 
all test items, raw scores on two different tests can- 
not be compared directly. A raw score of 50 on a 
reading aptitude test, for example, may be as dif- 
ferent from a raw 7 score of 50 on a mechanical apti- 
tude test as is 50 acres from 50 war bonds. It is 
necessary to translate raw scores into values which are 
comparable. It is necessary, in other words, to 
find a “medium of exchange” for the field of mental 
measurement. Such a medium w 7 ould make it pos- 
sible to compare all measurable characteristics, just 
as money makes it possible to compare the wealth 
of the man who has 50 war bonds with that of the 
man who has 50 acres. 




Figure 18. Distribution of test scores. The figures along the 
base line indicate the range of scores from lowest to highest. 
Each block represents a man and is placed above the score 
{or score interval) indicating his performance on the test. 
Thus, two men scored in the interval 10-1 4, four in the in- 
terval 15-19, etc. A line drawn to connect the tops of each 
of these columns of scores shows the general shape of the 
distribution. 

(a) If the scores made by all persons of a group 
are plotted, the resulting figure is called a score dis- 
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tribution or a distribution of scores. The method of 

plotting is illustrated in figure 18. 

(b) Most distributions of test scores plotted 
from data obtained on large unselected samples 
approximate that shown in figure 19. This curve 
is symmetrical and bell-shaped. A vertical line 
drawn through its highest point divides its area 
into two equal parts; and the point where this line 
intersects the base line is the mean or average score ' 
of the distribution. This curve, called the normal 
curve of distribution, is actually a mathematical ab- 
straction. It is, however, sufficiently typical of 
most curves representing Army test scores to be 
useful as a basis of comparison. Moreover, the 
percentage of men who stand higher or lower than 
any given point on a normal curve closely approxi- 
mates the proportion above and below the same 
point on distribution curves of Army test scores. 
The normal curve thus has great usefulness in 
analysis and prediction. 

(c) Several characteristics common to every 
distribution of scores should be noted. The width 
of the curve along the base line indicates the range 
of scores in the group tested. The height of the 
curve at any point above the base line indicates the 
number of persons receiving that particular score. 
The area bounded by the curve and the base fine 
represents the number of cases (N) in the group 
tested. 

{d) The shape which a distribution of raw test 
scores will assume depends upon several characteris- 
tics of the test and the group tested. If a test con- 
tains a large number of items, the distribution of 
its raw scores will extend over a wider range than 
will be the case for a shorter test. Moreover, de- 
pending upon the number of items and their diffi- 
culty, the position of the average score will vary for 
different tests. Thus, a particular raw score may 
be better-than-average for one test and below the 
average for another. Under these circumstances 
it is obvious that raw scores are not comparable or 
meaningful. 

b. Army Scores. (1) All Army scores are 
given the same average and the same spread or 
range. The distribution curves, therefore, have the 
same shape; and points in the same relative posi- 
tions (above or below average) have the same score 
values. Army scores are, therefore, comparable 
and their meaning in terms of relative ability is 
made clear. Distribution of scores for each test 
are given the same range and reference point by 
translating the raw scores into scores on a standard 
scale. This scale is developed by selecting the av- 



erage of the distribution as a base point, and ex- 
pressing the distance (or deviation) of each score 
from that point in terms of the deviations of all 
other scores. A mathematical expression of the 
deviations of all scores from the average of a dis- 
tribution is found as follows: 

(a) Subtract the average score from each 

raw score. 
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Figure 19. A normal distribution with average (shown by 
arrow) and standard deviations indicated. A score at 
A, regardless of its size in raw score terms, has a standard 
• score value of + 1.5 since it is half way between one and 
two standard deviation units ABOVE the average. Like- 
wise, B represents a standard score of ■ 0.5 since it is half 
of a standard deviation unit BELOW the average. Each 
man's standard score indicates his relative standing in 
the group, since the mathematics of the normal curve allow 
. determination of the percentage of the total area bounded 
by any two standard scores. The numbers in parentheses 
show some of these percentages. It can be seen that the 
score at A is exceeded by 7% of all scores in the group, 
and- that the score at B is higher than the scores of 31% 
(15 + 14 + 3). Furthermore , 62% of the group score 
between A and B (19 -f U +9)- Interpretations in 
these terms are meaningful. 

(b) Square each of the remainders, or deviations, 
so obtained. 

(c) Add all of the squared deviations together. 

(d) Divide this sum by the total number of 
scores. 

(e) Extract the square root of this quotient. 

(2) The .quantity thus obtained is called the 
standard deviation of the distribution. It is, in 
effect, a standard unit with which the deviations 
of all individual scores may be compared. Thus, 
any raw score can be converted into a standard 
score by computing its difference from the average 
raw score and dividing by the standard deviation 
of the raw scores. Figure 19 shows a normal curve 
of distribution with the average and standard devia- 
tions indicated. 

c. Army Standard Score Scale. The stand- 
ard score scale is so short— usually from about -3 
t 0 +3— that in order to make a fine enough differ- 
entiation between men, it is necessary to resort to 
decimal scores. Another inconvenience is that all 
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scores below average are negative in sign. The 
Army, therefore, multiplies each standard score by 
20 in order to get rid of the decimals, and adds 100 
to each score to get rid of the negative signs. Thus, 
the Army standard score scale designates the aver- 
ages as 100, and each standard deviation is repre- 
sented by 20 score points. Army standard scores, 
therefore, range from approximately 40 to 160 with 
about 68 percent of all cases falling between 80 
and 120. (See fig. 20.) 

d. Conversion Tables. The Army provides 
tables with its tests so that the raw scores recorded 
at any time and place can be readily converted into 
Army standard scores. Each of these tables is 
based upon the distribution of scores achieved on 
the test by the standardization population. (See 
par. 36.) Thus, when converted the score of a 
particular individual at a particular installation is 
compared with a very large and representative group 
of Army men and each individual’s rank, as regards 
the trait measured, is thus given in terms of the 
Army as a whole. All individuals are compared 
with the same standards. Army standard scores 
obtained by use of authorized conversion tables 
are, therefore, much more useful in classification 
than would be such scores derived by direct statis- 
tical computation from local data collected at any 
single installation. Moreover, the use of these 
conversion tables insures uniformity in translating 
raw into standard scores. 



62. Advantages of Army Standard Scores 

Army standard scores are most useful for classifica- 
tion and assignment purposes by reason of the fol- 
lowing specific advantages. They state test per- 
formance in such a way that small individual dif- 
ferences in potential ability or achievement are 
clearly revealed. They tell how a man ranks in 
comparison with other Army men. They make it 
possible to compare an individual’s expected per- 
formance with that of others, and to compare each 
man’s performance on any number of tests. They 
are mathematically convenient and, therefore, make 
further statistical analysis of data more efficient. 

63. Army Grades 

a. For many purposes of military classification, 
the Army standard score scale is a more refined 
measuring device than is required. In practice, 
a statement of the general range of potential abil- 
ities or achievements within which a man falls is 
in many cases sufficient. Army grades provide 
such a general statement. They give a rough, 
handy, indication of an individual’s position in the 
Army-wide distribution of the characteristic con- 
cerned. Figure 20 indicates the meaning of Army 
grades in terms of Army standard scores and also in 
standard deviations from the average. Note that 
each Army grade includes the same range of stand- 
ard scores,* but that the proportion of men is larg- 
est in Grade III, smaller in Grades IV and II, and 
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Figure SO. Army standard scores, Army grades, and percentile ranks in relation to the normal curve of distribution. 

*In order to compensate for irregularities in the distribution of scores, the limits for the Army Grades IV and V on the Army 
General Classification Test have been adjusted. These limits are : Grade V = 42—59; Grade IV = 60—89 (Army standard scores). 
This variation holds only for the AGCT. 
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RAW SCORE 

Figure SI. Conversion of raw scores to standard scores. Raw scores are not comparable. For example, a raw score of 65 on the Trade 
Information Test (TC-la) is above the average; the same raw score on the Cryptography Test (TC~4a) is very low. This is shown 
clearly when both raw scores are converted to Army standard scores. 



smallest in Grades I and V. It must be borne in 
mind that the divisions between Army grades are 
arhitrary; they may separate men of very similar 
abilities. For example, the difference between a 
man whose score places him barely within the Grade 



III range will differ more from a nigh Grade III 
man than from a man near the upper limit of Grade 
IV. Moreover, a great many men belong in sev- 
eral different grades since they have more skill and 
aptitude for some assignments than for others. 
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Figure Army grade distributions of AGCT scores. The first figure shows the distribution of men taken into the Army through 
June 1944 • The second figure shows the distribution of scores of men in the Army as of that date. The differences in these two 
figures are in -part accounted for by discharge of men in the lower grades. 



It is, therefore, a serious error to think of men as 
“Grade V,” “Grade III,” etc., except when they 
are being considered for a particular assignment in 
terms of the tests applicable to that assignment. 
It is equally erroneous to roughly and subjectively 
“average” a man’s scores on different tests and thus 
classify him in a hypothetical, composite grade 
which has been neither developed nor authorized 
by the Army. A great many assignments require , 
specific skills and aptitudes in the degree designated 
by directive, and performance on other irrelevant 
tests cannot be substituted. 

b. Another type of score sometimes employed 
to express test results is the percentile score. A per- 
centile may be defined as a figure expressing the 
percentage of scores in the population being con- 
sidered which is exceeded by the particular score 
for which interpretation is sought. Thus, a score 
which equals or exceeds 90 percent of all scores 
made by men tested for radio code-learning aptitude 
is a score at the 90th percentile. The relation be- 
tween percentile scores and Army standard scores 
and grades is illustrated in figure 20. 



64. Army Standard Scores Are NOT “IQ’s” 

Army standard scores bear no relationship to such 
concepts as the “IQ” (Intelligence Quotient), or 
“MA” (Mental Age), and Army test results will 
not be interpreted in terms of these concepts. (See 
par. 19, TM 12-425.) 

Section II. SIGNIFICANCE OF 
TEST SCORES 

65. General 

a. To grasp the full significance of a test score, 
it is necessary to understand something more than 
the fundamental basis of the Army standard score 
scale. That a soldier may rank in the top 10 per- 
cent of all Army men in performance on a particu- 
lar test is factual information. That he has brown 
eyes and wears a size 7M hat is also factual informa- 
tion. But the Army is not concerned with the col- 
lection of odd scraps of biographical data. Eye 
color and head size, to be sure, may be valuable 
clues to identification. But a test score, whether 
high or low, remains an isolated and useless fact 
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until its significance is fully realized. And the 

first step in realizing this significance is to know 
what the test measures. 

b. It is impossible to assign all Army tests titles 
which give a clear understanding of what is meas- 
ured by them. The Clerical Aptitude Test meas- 
ures aptitude for clerical work; and the Cryptog- 
raphy Test evaluates either aptitude or skill in 
cryptography. But one may well ask: What 
about the Army General Classification Test? For 
what does a high score on the Qualification Test 
qualify? Titles that seem the most specific are 
apt to be the most misleading. Clerical assign- 
ments, for example, are of many kinds and varieties; 
the Clerical Aptitude Test does not have equal sig- 
nificance for them all, and for some has no predictive 
value whatever. 

66. Test Scores Predict Performance 

a. It has been repeatedly emphasized that Army 
tests are designed to meet specific selection prob- 
lems. It has been shown that bach test measures 
certain skills or abilities because these skills or abil- 
ities are essential for successful completion of train- 
ing courses or adequate performance of duty assign- 
ments. Army tests are used, therefore, because 
they predict performance in advance of selection or 
assignment. To comprehend the full meaning or 
significance of a test score, then, it is necessary to 




Figure S3. The test samples the abilities of the individual and 
the criterion rating estimates performance on the job. The 
correlation between the test score and the criterion rating 
thus indicates the accuracy with which the test score will 
measure a soldier’s ability to perform a job. 



understand precisely what performance the score 
predicts, and how well it predicts that performance. 
Both of these questions can be answered only after 
field trials and statistical analysis of the results. 

b. A review of the manner in which a test is 
developed (ch. 3) will give a partial answer to the 
question of what kind of performance the test score 
predicts. This review will show that test items are 
selected because they are related to particular as- 
pects of performance on certain job assignments — 
related, that is, in the sense that the items are an- 
swered correctly by the same persons who rate high 
in those aspects of performance on the job. It will 
be recalled that these phases of job success that' 
are employed as standards of comparison in the 
selection of test items are called criteria of success. 
Since the test was built in this fashion, it can be 
assumed that each individual's score will predict 
his standing on the particular criteria of job per- 
formance employed. For example, if a test is 
constructed by selecting items which prove to be 
related to code-receiving speed, then it can be shown 
that scores on the test wall predict speed in receiv- 
ing code after a certain period of training. It should 
be noted, however, that the test cannot be assumed 
(in the absence of proof) to measure any other cri- 
terion of radio operation — though this often proves 
to be true. This is one reason why the review of 
the paragraphs on test construction gives only a 
partial answer to the question of just what per- 
formance the test score will predict. Another rea- 
son is that some tests, like the Army General Classi- 
fication Test, are not designed to measure a single 
skill. Such tests can be used to evaluate chances 
for success in a large number of jobs or training 
courses which involve “general learning ability." 
They may do a better job of predicting performance 
in some of these cases than in others. So the best 
and most logical way to determine what a test 
measures is to make a comparison between the per- 
formance of men on the test and performance of 
the same men on actual assignments or in training 
courses. The statistical statement of this compari- 
son, then, will indicate how well the test perform- 
ance predicts job performance. If those men who 
score high on the test are also successful on the job, 
the test is said to be a valid test for purposes of pre- 
dicting performance on that particular job. If the 
relationship is a consistent and close one, the test 
is said to have high validity for that purpose. It is 
important to recognize that the same test may have 
high validity for some purposes, low validity for others, 
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Figure 24. Test scores are employed to predict criterion ratings. The more nearly the men who get the same test score tend to achieve 
the same criterion rating, the more valid the test , and the more accurate the prediction . 



and under other circumstances , no validity at all. 
Before employing any test result for assignment to 
any job or training course, the classification officer 
should satisfy himself that the test is valid for that 
purpose. Test validities are specific; they are evi- 
denced, in each case, by demonstrated relationships 
hetween test scores and criteria of job success. 

67. Validity Coefficient 

The degree to which performance on the test com- 
pares with performance on the criterion can be por- 
trayed by plotting both on a graph. 

a. The process is somewhat analogous to plot- 
ting a distribution of scores; but since there are 
two scores for each person (test score and criterion 
score), it might be termed a double or square dis- 
tribution. The way in which this graph is con- 
structed is shown in figure 25. The mathematical 
statement of this relationship between test score 
and criterion score is called the validity coefficient. 

b. The validity coefficient is extremely valuable. 
It tells, first of all, whether the test has enough va- 
lidity to be useful. In general, it can be said that 



any test which - saves the Army time, materials, 
manpower, and money is sufficiently valid so long 
as it cannot be replaced by a better test. Often a 
test of comparatively low validity is preferable to 
the alternative of purely chance or random selection 
and assignment. Secondly, the size of the validity 
coefficient makes it possible to predict the approxi- 
mate criterion score of men receiving any given test 
score. Thus, from a knowledge of a man’s score on 
a valid test, it is possible, within the limits of prob- 
ability, to predict how successful he will be on the 
job, or to compute his chances of becoming average 
or better on the joh or training course. 

68. Expectancy Charts 

The research w 7 ork and statistical computations in- 
volved in determining the validities of tests for 
various duty assignments is intricate and requires 
long training. It is not expected that this can be 
done by classification personnel in the field. Much 
of this work is executed by personnel procedures 
officers and expert technicians, and the more im- 
portant results are made available in readily in- 
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Figure 25. Teat scores plotted against criterion scores to 
show degree of correlation. In diagram {a), one point 
is plotted. The triangle on the line labeled ‘ test score’ 
represents the test performance of an individual. The 
circle shows the criterion score of the same man. The 
dotted lines are drawn perpendicularly to the two base 
lines at these two score points. These dotted lines inter- 
sect in a single point which represents the individual’s 
score on both test and criterion. The remaining dia- 
grams show graphs with 4 or 6 points plotted. In ( b ), 
where the four individuals each have the same rank on 
both the test and criterion scores, it can be seen that the 
plotted points all fall on a diagonal straight line. This 
indicates a ‘ perfect ’ correlation. In (c), there is no re- 
lation between the two sets of scores, and the plotted points 
form more or less of a circle. This indicates a very low 
or ‘zero’ correlation. Usually the degree of relationship 
mil be somewhere between these extremes and the plotted 
points will form an ellipse as in ( d ). The narrower this 
ellipse, the more it approaches a single line, the higher 
the correlation. The wider the ellipse, the more it ap- 
proaches a circle, the lower the correlation. 

terpreted and usable form. For the most part, 
these data are presented in the form of expectancy 
charts. A number of these charts are provided in 
chapters 8 and 9 of this manual, and their use is 
discussed more fully there. In effect, the expect- 
ancy chart reveals in graphic terms the expected 
job performance of men receiving various test scores. 
More specifically, they show the probabilities 
(chances in 100) of better than average success on 
particular job or training assignments for men re- 
ceiving various scores on particular tests. Their 
importance in connection with the use of tests in 
classification and assignment should be obvious. 

69. Critical Scores 

a. For purposes of assignment, field work and 
Btatistical analysis may result in the stipulation of 



certain critical scores which must be achieved or 
surpassed by every man selected. As stated pre- 
viously (par. 37), critical scores are set at a point 
dictated by practical necessity. If the supply of 
men is large in relation to the demand, the critical 
score may be set so high that nearly all the men se- 
lected will succeed. In other words, Army tests 
may predict success very accurately if the critical 
score is set high enough. The manner in which 
critical scores are adjusted to meet Army needs is 
illustrated in figure 26. 




TEST SCORES 

Figure 26. Proportions of satisfactory and unsatisfactory men 
who would be selected by var ious critical test scores when the 
validity is fairly high. If the critical score is set at A, only 
the men represented by the areas I and 2 will be accepted, 
and nearly all of these will be successful ( 1 ). If more men 
are needed, the critical score may be lowered to point B. 
Then an added group of men (arms 3 and 4) will be ac- 
cepted. Most of these will be successful (S), but an appre- 
ciable number will fail (4). If the critical score is lowered 
still further (to C ), the additional men admitted (areas 6 
and 6) will be about equally divided between satisfactory 
and unsatisfactory. Lowering the critical score inevitably 
means that a larger number of failures will be selected , but 
the Army must sometimes countenance this waste in order 
to obtain the requisite number of qualified men. 1 1 should 
be noticed that even when the critical score 18 lowered to C, 
the total number of satisfactory men selected (areas 1, S, 
and 5) is considerably in excess of the total number selected 
who are unsatisfactory (areas 2, 4, an d 6). The critical 
score is often called the ‘cut-off point, ' since men who score 
below it are not considered for the assignment in question. 

b. The situation arising from selection on the 
basis of critical scores is often complicated by the 
fluidity of standards of success. Since men are 
usually considered satisfactory if they are average 
or better and unsatisfactory if below average, it 
follows that standards will be adjusted to the av- 
erage level of the group. When critical scores are 
lowered, thus lowering the average level of the 
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group, standards of performance on the assignment 
tend to decline and a larger-than-expected number 
of men will be considered satisfactory. Conversely, 
when critical scores are raised, standards of per- 
formance tend to rise and fewer men will succeed 
than the test scores would indicate. The exact fig- 
ures in expectancy charts, therefore, should be ex- 
pected to apply only when course or job standards 
remain fixed. (See par. 119 1 

Section 111. RELIABILITY 

70. Definition 

An ideal measuring instrument would yield equally 
accurate results every time it was used. Such an 
instrument would be perfectly reliable. This ideal 
is often approached, but never, even in physical 
measurement, completely attained. There are 
some imperfections in every instrument. More- 
over, there are exceedingly small but discernible 
changes in the materials to which the instrument 
is applied in the case of physical measurement, 
changes resulting from variations in temperature, 
moisture, pressure, etc. With psychological meas- 
uring instruments, the situation is somewhat aggra- 
vated. The test itself often falls farther short of 
perfection than is usual with physical instruments. 
But the variations in the men measured, due to 
such influences as fatigue, previous testing with 
the same or a similar test, lack of interest or effort, 
and variations in the manner of administering the 
test from time to time, or examiner to examiner, 
contribute most to the unreliability of test results. 
These latter factors, which originate outside of the 
test itself, can be largely controlled by a strict ad- 
herence to the regulations and directions concern- 
ing the administration of all Army tests. (See 
ch. 4.) 

71. Estimating Reliability of a Test 

o. General Allowance must be made, how- 
ever, for the small proportion of those factors ex- 
traneous to the test which cannot be controlled, 
and for the imperfections of the test itself which 
make for less then perfect reliability. It is thus 
necessary to determine just how reliable the test is 
in order to know how much unreliability to allow 
for. Army tests are subject to rigorous and ex- 
haustive statistical checks before being released to 



determine the accuracy of the measures which may 
be obtained with them. 

b. Method. Comparing the scores achieved by 
men who take the same test two or more times 
under identical circumstances, or who take exactly 
equivalent forms of the test at one time, would 
show whether the test is reliable. For, if the same 
man gets the same score each time or on each equiv- 
alent form, the test must certainly be reliable. 
Whatever trait it measures, it measures consist- 
ently; but if many individuals get widely dif- 
ferent scores each time, it must be unreliable, and 
no confidence can be placed in any single score. 
It is impossible, however, to give the test more 
than once under identical circumstances. Besides 
the obvious effects of familiarity, there are bound 
to be other changes in the examinees from time 
to tune. And it is usually too difficult (if not 
impossible) to construct an exactly equivalent form 
of the test. In practice, therefore, the Army makes 
use of a statistical technique* which estimates the 
result that would obtain from the administration 
of exactly equivalent forms of the test at a single 
session. The technique is accurate for most types 
of tests and has the important advantage that it 
saves both time and effort. The mathematical 
statement of the result is called the reliability 
coefficient. 

72. Use of Reliability Coefficient 

The reliability coefficient gives important infor- 
mation about the test and the interpretation of 
test scores. 

a. Improving Test Reliability. The first 
use of the reliability coefficient is to determine 
whether the test is reliable enough for classification 
purposes. If the reliability proves to be too low 
it can usually be increased by the addition of more 
items of the same kind. (See par. 25.) But since 
long tests are time-consuming, reliability beyond 
practical usefulness is not sought. All Army tests 
before being released, are made sufficiently reliable 
for use in classification and assignment, provided 
they are used as directed in the manuals accompany- 
ing them and for purposes for which they were 
designed. 

b. Interpreting Test Scores. (1) General 
Because Army tests are reliable instruments, it is 
improbable that an individual's score will vary 
enough to present a practical problem in classi- 



‘Kuder, G. F. & Richardson, M. W. The Theory of the Estimation of Test Reliability. Psychometrika, 1937, 2, 151-ie 
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flcation work. At the same time, it should be rec- 
ognized that very small differences between scores 
(5 or 6 standard score points, for example) should 
seldom be interpreted as reflecting real differences 
between the individuals receiving them. This 
statement does not imply that, where critical scores 
are stipulated, persons scoring only 5 or 6 points 
below these standards should be considered as 
passing. While it is true that a number of such 
people, if retested, might very well reach or exceed 
the critical score, it is equally true that a like 
number will score lower than before. It will be 
recalled that critical scores are computed on the 
basis of quotas and manpower supply as well as 
expectancies of success. Any laxity in applying 
these standards, such as that involved in the practice 
of making exceptions in individual cases that are 
“close,” can only upset quota calculations without 
at all increasing the validity of selection. 

(2) Retesting. Because Army tests are reliable, 
it cannot be expected that retesting, by and large, 
will result in marked score increases. Familiarity 
with the test and the test situation may contribute 
a few points of increase, but experience proves that 
this increase, on the average, will be small. More- 
over, it is important to recognize that there is 
no virtue in getting high scores for their own sake. 
The only purpose in using tests at all is to enable 
predictions of job or training success, and there are 
seldom grounds for supposing the higher retest 
score to be a better predictor than the original— on 
the contrary, there usually is reason to suspect its 
validity. Therefore, except where there is marked 
discrepancy between a man’s test score and his 
abilities as inferred from other information (for 
example previous education and occupation) the 
practice of retesting should be discouraged. Re- 



testing is governed by the regulations contained in 

paragraph 18, TM 12^25. 

Section IV. SUMMARY 
73. General 

The following information is provided to aid classi- 
fication officers and other personnel in the correct 
interpretation and use of test data. 

a. Information Provided in Test Manuals 
Issued with Each Test. (1) Conversion tables 
which enable field personnel to translate raw score 
into standard scores. The latter show the com- 
parison of each tested individual with all other Army 
men as to the particular skill or aptitude which the 
test measures. 

(2) Validity coefficients given in terms of the 
available criteria. These indicate the degree of 
probability that a man who achieves .any given 
Army standard score will achieve a certain pre- 
dictable criterion score. That is, in more general 
terms, they show how accurately standard scores 
on a test predict success on the designated 
assignments. 

(3) Reliability coefficients which indicate the 
accuracy with which the test measures every tested 
man. 

b. Information Provided in Directions. 
Critical scores which must be achieved or exceeded 
by men assigned to specified training courses and 
duties. 

c. Information Provided in This Manual 
(ch. 8) Expectancy charts, which show the chances 
in 100 that men who achieve designated test scores 
(ranging from low to high) will achieve average or 
better performance on specified assignments. Ex- 
pectancy charts are provided with statistical data 
requisite to their proper interpretation. 
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CHAPTER 6 

RATING SCALES, QUESTIONNAIRES, AND INTERVIEWS 



Section I. ORIENTATION 

74. General 

The tools of the Army classification system are 
techniques or instruments designed to aid in the 
evaluation of the characteristics of men in the 
Army. They are techniques employed to discover 
and to measure as accurately as possible what each 
man can do or is capable of learning to do. Wher- 
ever possible, scientific measuring instruments are 
used. (See ch. 2.) These instruments are the 
numerous tests that are described elsewhere in this 
manual and that have proven of unquestionable 
usefulness in the evaluation of skills, capacities, 
and aptitudes. 

a. There are, however, characteristic's of men 
that enter into the determination of success in 
training or in assignment — personality traits and 
social adjustments which do not lend themselves 
as readily to measurement by tests. For example, 
no instrument has as yet been devised to test the 
traits of “leadership” in a satisfactory manner. 
Yet most men would agree that individuals differ 
with respect to the degree to which these traits 
are developed; and further, that these differences 
in “leadership,” properly rated, are important 
factors in classification and assignment. 

b. In those areas in which tests are available 
for evaluating skills and capacities, additional 
evidence is often useful to round out the data on 
individuals arid thus make sound classification and 
assignment doubly sure. Information about pre- 
vious occupational experience, education, interests, 
and hobbies is of this nature. And such infor- 
mation is obtainable in useful form by a proper 
employment of the interview, the questionnaire, 
and the check list. ' 

75. Purpose and Scope of This Chapter 

This chapter is included to orient the testing officer 
in the other procedures used in the Army classifi- 
cation system, so that he will understand their 
relationship to testing as conducted in the Army, 
and the important parts played by each of the tools 
used in personnel evaluation. The techniques and 
methods described here are, of necessity, less form- 
alized than tests. A few special questionnaire forms 



and check lists have been developed and authorized 
for classification purposes. (See ch. 8.) The inter- 
view is employed not only in initial classification 
in the reception center (ch. 7), but in all echelons of 
the Army. It is the purpose of this chapter to de- 
scribe and evaluate these tools, and to provide 
suggestions that will aid in their development, use, 
and interpretation. The judicious use of these 
devices, together with the information derived 
from tests, will achieve a well-rounded program of 
classification and assignment. 

76. Value of Rating Scales 

A good rating scale enables one person, on the basis 
of adequate observation, to judge another and to 
assign to him quantitative evaluations of character- 
istics not at present amenable to scientific measure- 
ment through tests. Examples of such character- 
istics are physical appearance, bearing, leadership, 
manner of speech, etc. The rating scale also pro- 
vides a convenient means of obtaining estimates 
of proficiency or manner of performance in duty 
assignment when the finer discrimination provided 
by a test is either not feasible or not necessary for 
practical purposes. 

77. Value of Questionnaires and Check Lists 

Carefully constructed questionnaires and check 
lists are useful instruments for obtaining infor- 
mation about the background of the individual, 
his educational and occupational training and ex- 
perience. They may also be used to make in- 
ventories of personal characteristics such as interests, 
attitudes, and social reactions. Questionnaires of 
this sort are often elaborate devices that can be 
“scored” like a test. 

78. Value of the Interview 

The interview is the source of very important in- 
formation: as such it is an oral questionnaire. 
It may be used to coordinate and evaluate data 
from several sources according to its impact on the 
individual. It may be used as a medium for giving 
information, advice, guidance, or therapy. It may 
serve all of these purposes at the same time, and 
it may further, because of its essentially intimate 
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RATING SCALE OF COMPETENCE IN PERFORMING A WORK PROJECT 



Fi r a t Middle 
CLASS 

Instructions to rater: Consider carefully each of the five descriptive 

paragraphs below; then, on the basis of your observation of the 
trainee whose name is entered above, decide which of these paragraphs 
beet describes his work on the project and place a check-mark {/} in 
front of that paragraph. 



NAME 

Las l 

ASN 



□ (1) Could not complete job even with major assistance from instruc- 
tor. Did not know the relative parts of hie job either by 
definition or UBe. Had no understanding of why the job was to 
be done . 



□ {2) Was able with difficulty to complete parts of the job himself. 

Had an ideo what to do but lacked sufficient information or 
dexterity to complete all parts of the job. Understood very 
little of why he did the job. 



□ m 



Had a general idea of what was to be done. Finished the job 
but with minor errors of omission or commission. Made false 
starts, changes, and repetitions. Was not sure of himself or 
his product. 



□ (4) Completed the required job with little hesitancy. Learned 

what to do and understood generally the underlying principles. 



P I (5) Completed the job quickly and efficiently. Learned what to do, 

' ' why to do it, and the relationship of this job to others being 

studied in the unit. 



RATER 

RANK ORGANIZATION 



Figure 27. 
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character, add the touch of personal attention that 
can do so much for morale. 

79. Comparison with Tests 

Interviews, questionnaires, check lists, and rating 
scales are valuable personnel instruments. As 
devices for obtaining accurate and impartial in- 
formation about abilities and aptitudes, they are 
inferior to tests. As techniques for guiding and 
objectifying informal data-gathering, they are far 
superior to unaided human judgment. To be of 
any real use, the questionnaire and rating scale 
must be worded and constructed with care. And 
if it is to serve any of its ends, the interview must 
make use of men who are friendly and patient and 
well trained in the arts of questioning and counsel- 
ling. At their worst, these techniques can introduce 
bias and error and decrease the validity of evalu- 
ation and classification. At their best, they can 
be extremely useful. But they should never be 
considered as substitutions for the scientific'measure- 
ment by tests wherever tests have been designed 
for specific purposes. 

Section II. RATING SCALES 

/ 

80. Definition 

The rating scale may be cast in several forms. 
The proper form should be chosen according to the 
nature of the problem to be solved, the trait or 
traits to be evaluated, and the persons who are to 
make the ratings. Despite minor variations and 
elaborations, the rating scale is essentially a device 
for eliciting evaluative judgments of traits or 
qualities of individuals in such a manner that 
these judgments can be handled quantitatively. 
Evaluative judgments of traits or characteristics 
are constantly being made in all kinds of situations. 
The soldier makes informal judgments about the 
abilities of his fellows or the competence of his 
officers. The squad leader, sergeant, platoon 
leader, and company commander are constantly 
evaluating the military proficiency of their men. 
Commanders of higher echelons render periodic 
efficiency ratings on their officers. And, in schools 
and training centers, instructors pass judgment on 
the progress and competence of their trainees. 
Most of these judgments are couched in qualitative 
terms, like “good” or “average,” “industrious” or 
“lazy.” Such ratings, like the free-answer or essay 
type test response, are highly subjective, open to 
individualistic interpretation, and difficult to handle 



in meaningful fashion. The rating scale provides 
an objective framework for the evaluation. It 
offers the rater a nicely graded series of guideposts 
within which he may indicate his judgment by 
making a check mark or assigning a predetermined 
numerical “score.” Figure 27 is an example of a 
simple type of rating scale for evaluating the gen- 
eral competence of trainees in their performance 
of a specific task. 

81. Types of Rating Scales 

The rating may take several forms. 

a. Ranking. An obvious method of rating a 
group of individuals according to their possession 
of a particular trait or characteristic is that of rank- 
ing. This consists of assigning the number 1 to 
the person who stands highest in the group with 
respect to that characteristic, the number 2 to the 
person who stands second, and so on down through 
the whole list. Although this method is easily 
understood, it has serious disadvantages. It is 
exceedingly cumbersome when the number of 
persons to be rated is large. Moreover, it is im- 
possible to combine ratings on groups of various 
sizes since the numerical value assigned to each 
person— his position in the group — obviously de- 
pends on the size of the group. For example, the 
poorest person in a group of 5 receives the same 
“score” as a man in the top fourth of a group of 20. 
And again, the standard of reference of the ratings 
with this method is always the average of the group 
being rated rather than the average of all trainees, 
of all men in that assignment, or of all men, in the 
Army. 

b. Numerical Rating Scale. Some of the 
difficulties encountered in the method of ranking 
are remedied by means of the numerical rating scale 
or the scale of values. With this method a number 
of categories are set up in descriptive terms and 
assigned predetermined numerical values. Each 
of the persons rated is then placed in that one of the 
categories which best describes him, and given the 
numerical value or “score” that goes with it. 
Figure 27 is an example of this type of rating scale. 
Whether or not groups rated separately can legiti- 
mately and meaningfully be combined with this 
method depends upon whether or not all raters 
interpret the descriptive paragraphs in the same 
manner. In general, the reliability of the rating 
scale is impaired because the standards of different 
raters vary. 
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Figure 28. 



c. Graphic Rating Scale. If the descriptive 
categories are represented as distances along a fine, 
the scale becomes a graphic rating scale. Figure 
28 illustrates the technique applied t6 some seven 
characteristics that are considered important for 
officer candidates. The amount of each charac- 
teristic is represented as a distance along the line 
with descriptive phrases acting as milestones. The 
rater places a eheck mark at any point along the 
one which indicates his judgment. 



82. Construction of a Rating Scale 

The construction of a rating scale, like the construc- 
tion of a test (ch. 3), should start with a clear- 
cut formulation of the purpose to be served by the 
scale. This statement of purpose should encompass 
the nature of the selection problem which demands 
the evaluation of traits, the specific trait or traits 
to be evaluated, the population to be rated, and the 
persons who are to utilize the seale as raters. The 
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exposition that follows has particular reference to 
the numerical scale and the graphic rating scale. 

a. Definition of Characteristic To Be 
Rated. To insure that all raters gain as nearly as 
possible the same understanding of just what 
characteristic of the individuals they are to judge, 
it is necessary to have a clear-cut and complete 
definition of this characteristic. If, for example, 
officers are asked to rate men with respect to “co- 
operation,” the task is difficult since each officer 
will have to decide for himself just what constitutes 
this trait. The description of the trait should be 
in terms of concrete behavior that can be observed 
rather than abstract qualities. The definition 
should thus be based on a “job analysis” and on 
observations of individuals displaying the quality 
to be evaluated. The following is an adequate 
definition of cooperation: 

Cooperation. Consider how well the soldier 
works with his supervising noncommissioned 
officers and other soldiers. Take into account 
whether he readily assumes assigned duties 
and responsibilities which may be inconvenient, 
whether he freely offers to help, how well he 
works at jobs requiring teamwork, etc. 
b. Number of Degrees in Scale. The next 
step is to decide how many degrees or steps the scale 
is to have. This will depend upon how well the 
trait is defined, how precisely and objectively it can 
be observed, and how many degrees of it can be 
discriminated. In some cases it will not be worth- 
while to attempt more than a three-step discrimina- 
tion such as “above average,” “average” and “be- 
low average.” Usually finer discrimination into 5 
or even 7 degrees is possible. 

c. Defining Positions on Scale. In both the 
numerical or the graphic type of scale, the various 
scale categories must be clearly defined. These 
are the guideposts of the scale, and like all guide- 
posts, should be explicit and unequivocal. They 
will achieve this necessary clarity if they are de- 
scribed in terms which refer to objective behavior, 
to activities of individuals displaying the varying 
degrees of the trait. With respect to the charac- 
teristic of “cooperation” cited above, for example, 
the following five categories might be used to define 
positions in the scale: 

(1) Frequently balks at doing inconvenient 
tasks; never puts himself out to help others; fre- 
quently in disagreement with other soldiers. 

(2) Avoids doing inconvenient tasks; does not 
get along well with other soldiers. 



(3) Carries out most duties and responsibilities 
assigned to him, although he occasionally tries to 
avoid inconvenient assignments; works with others 
fairly smoothly. 

(4) Readily assumes all duties and responsi- 
bilities which may be assigned to him; gets along 
well with others. 

(5) Goes out of his way to be helpful; volunteers 
for duties and responsibilities; works very well with 
others. 

d. “Halo” Effect. When a rater evaluates an 
individual with respect to a number of different 
traits, he is likely to rate in the same way on all of 
them regardless of their independence. Having 
formed a general impression of “excellence,” for 
example, he will tend to credit the individual with 
above-average standing on all of the characteristics 
included in the scale. This tendency to base ratings 
on a general overall impression, rather than on an 
independent consideration of each characteristic in 
turn, is called the “halo” effect. It must be avoided 
if the separate ratings are to have maximal use- 
fulness. There is no reason why all of the character- 
istics in the scale illustrated in figure 28 should 
necessarily be related. It is quite possible for a 
person to be able to handle men well, to express 
himself with facility, and yet be untidy in his dress 
and bearing. Aet if the officer using this scale has 
formed a general impression of the individual as an 
undesirable candidate or unlikely officer material, he 
will tend to place all of his check marks toward the 
lower end of the scale if desirable and undesirable 
characteristics are listed in columns. It will help 
to avoid this tendency to vary the alignment of the 
scale in such a way that the high or “good” ex- 
tremes fall as often at the left as at the right ends 
of the lines. This arrangement may help to force 
the rater into considering each scale separately ; but 
the best way to avoid the effect of the “halo” is 
through the training of raters. 

83. Preliminary Try-out of Scale. 

After the scale has been constructed in preliminary 
form, it should be subjected to a thorough try-out 
under real conditions. It can then be revised and 
adjusted in the light of this experience. The rating 
scale, like the test, should be checked for reliability 
and validity. Its reliability can be tested by 
determining the extent to which soldiers in a group 
are rated in the same way by independent raters, or 
by determining the consistency of the ratings made 
by a single judge on different occasions. The 
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validity of the scale can be computed in the normal 

manner by determining the relationship between 
the ratings of a group and some independent cri- 
terion of success in training or on assignment. It 
should be evident that any rating scale, no matter 
how carefully constructed or how reliable it may be, 
is of little value if it does not provide evaluations 
that are related to success in training or on the job. 
The preference for measurement by tests, where 
these are available, is largely based on the fact that 
the reliability and validity of rating scales are 
generally somewhat lower than those of tests. 

84. Administering and Scoring Rating Scale 

a. Abundant experience has demonstrated that 
ratings which are the result of the combined judg- , 
ments of several individual raters are more reliable 
and valid than those obtained from single raters. 

. It is, of course, obvious that the several ratings 
should be completely independent and not the 
result of previous discussion and collaboration 
among the judges. At least three independent 
ratings of each individual, on each trait should be 
obtained if feasible. These can then be averaged 
to provide the best estimate of the amount of each 
trait the individual possesses. 

b. (1) Those persons who are to act as raters 
must be thoroughly oriented and trained if their 
evaluations are t'o be of maximal usefulness. The 
orientation should cover such factors as the purpose 
of the ratings to be obtained, the kinds of behavior 
to be observed, the avoidance of the “halo” effect, 
and the necessity for uniformity in the interpreta- 
tion of the various traits and the various positions 
on the scales. 

(2) The desired uniformity of standards can 
best be achieved if the raters bear in mind the 
approximate nature of the distribution of ratings 
that is desired. On a five-step scale, approximate- 
ly 40 percent of a large unselected group of indi- 
viduals would be expected to be rated in the “aver- 
age” or central position, with 23 percent just above 
and just below this position and 7 percent at each 
extreme. Untrained raters should be given prac- 
tice in observing behavior, evaluating traits, and 
apportioning ratings. 

c. “Scoring” Rating Scams. It is convenient 
to refer to the numerical value assigned to a rating 
as a “score.” The process of “scoring” a rating 
scale, therefore, is that of determining what that 
value should be. There are many possibilities. 
The simplest is to use the serial number of the 



category checked. If the third, or middle, category 

of a five-step scale is checked, for example, the 
“score” on that particular category would be 3. 
With the graphic rating scale, the rater may place 
a check-mark at any point along the line, and conse- 
quently the “score” can be the distance of this 
check-mark from the end of the line. By measuring 
this distance to the nearest inch or in any fraction 
thereof, any desired degree of discrimination may 
be “scored.” As stated above, several independent 
ratings of the same trait may be averaged for 
greater reliability. Where several different traits 
are rated, as in the scale illustrated in figure 28, the 
separate “scores” may be combined to give an 
overall rating of “leadership” or proficiency. They 
may be combined by simple addition, or multiplied 
by weights and added, depending upon the use to 
which they are to be put and their correlation with 
the criterion of success. Or the separate “scores” 
on each of the different traits may be plotted as a 
“profile” to provide a graphic presentation. (See 
fig. 29.) 



PROFILE OF STUDENT OFFICER RATINGS 

Instructions for filling out profile chart are Aiven on oack of card. 



Traits Hated 




Military fearing and Neatness 


— MLJ 1 T 1 "Cd 


Physical Adaptability 
Qroup Acceptability 






Attitude 




Ability to Handle lien 






Stability 





NAME ASN 1_ CLASS 

AOE -2.5 F.DIICATION A'JCT_l_2jS OCT UA 



Figure 29. 

Section III. QUESTIONNAIRES AND 
CHECK LISTS 



85. Definition 

When used as a classification instrument, a question- 
naire is a series of uniform questions of proved 
effectiveness which are presented to an individual 
for the purpose of eliciting information about him- 
self. It is not to be confused with a test though 
the latter is also in many cases composed of ques- 
tions. A test elicits behavior which indicates the 
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degree to which the examinee possesses knowledge, 
skill, or aptitude required for a particular training 
course or job assignment. A questionnaire, on the 
other hand, is employed primarily to secure personal 
information concerning the individual’s past ex- 
perience or his present attitudes. A questionnaire 
requires the individual to report upon himself, 
ratber than causing him to demonstrate his 
characteristics in a measurable fashion. Never- 
theless, a properly constructed questionnaire may 
furnish information valuable in predicting future 
achievement. When this is the case, the question- 
naire is a valuable addition to the tools available 
for classification and assignment. 

86. Information Gained Through 
Questionnaires 

The questionnaire is customarily used to collect 
two kinds of information about the soldier. 

a . Objective Biographical Data. This in- 
cludes such factual or verifiable information as the 
soldier’s age, schoohng, marital status, occupational 
history, and experience. Some data of this type 
(education and training, for example) are quite 
obviously of value in selecting prospective candi- 
dates for further training or suitable duty assign- 
ments. 

b. Subjective Information. This includes 
opinions, interests, attitudes, likes and dislikes, 
prejudices, as well as habitual fears and worries, 
and characteristic modes of behavior. Information 
of this kind about an individual may throw light 
on his personality and aid in predicting how he will 
act in various critical situations, more or less inde- 
pendent of his knowledge or skill. 

87. Kinds of Questionnaires. 

The questionnaire may be either oral or written. 

a. (1) The oral questionnaire is essentially a 
standardized interview. (See sec. IV.) The inter- 
viewer asks a series of specific questions and records 
the oral answers of the subject. 

(2) The classification interview conducted in 
the reception center is an example. It follows that 
any questionnaire may be presented orally, if this 
is desirable. 

b. (1) In the more formal written form, the 
questions are presented in booklet form and the 
answers are written by the individual either in 
spaces provided in the booklet itself, or on a sepa- 
rate answer sheet. 



(2) The written questionnaire has certain im- 
portant advantages over the oral type. In the 
first place, it insures more uniformity in the method 
of presentation. Secondly, it provides more 
privacy. In the very personal face-to-face inter- 
rogation, the soldier will sometimes be restrained by 
embarrassment from giving full and honest answers. 
This restraint will be particularly strong if the 
questioner is unskilled in the art of interviewing. 
Consequently, the written questionnaire has an 
advantage m that it does not require skilled 
personnel for its administration. Moreover, the 
written form enables the soldier to allot his own time 
and to work longer on those questions which require 
“thinking out,” without feeling rushed (as with the 
oral type) by the patient waiting of the ever-present 
interviewer. And finally, it is an obvious advantage 
of the written type of questionnaire that it can be 
given to groups with all of the economy of time and 
effort that group administration permits. 

c. The check list is a particular variation of the 
written questionnaire. It is composed of a series 
of statements or descriptions, rather than questions, 
and the soldier is instructed to check those which 
are applicable or pertinent to him. It is particu- 
larly useful in gauging the range of occupational 
experiences or the extent of emotional problems. 
Figure 30 shows a portion of a check list used to 
determine the extent of the individual’s experience 
in clerical work in the Army. 

88. Construction of Questionnaire 

The development of a questionnaire for use as a 
classification tool should proceed along the same 
lines as the development of a test. (See ch. 3.) It 
should first be decided for what purposes the ques- 
tionnaire is to be used. Then in the light of a job 
analysis, it should be determined what sorts of 
information are to be obtained — whether age, 
schooling, etc., or interests, attitudes, and opinions. 

a. The questions themselves should be phrased 
in such a manner that they are clear and unam- 
biguous. It is not sufficient, for example, to ask 
“How much education have you had?” For re- 
sults that are definite and unequivocal, the ques- 
tion should be worded somewhat as follows: “What 
is the highest school grade that you completed?” 
Special care should be exercised to avoid certain 
types of questions: 

(1) The “double-barrel” question. This is 
really a compound question which can seldom be 
answered in a simple yes-or-no fashion. The ques- 
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(First). 



(Initial) 



ORGANIZATION . 



ARMY SERIAL NDUBEE 



Ral« are lists of Army clerical operations. If you hays performed the operation and you can do a 
STjS M "on youT^n," check (^) the item. Leave blank those items with *hich you have had 

very little or no experience. 



SOURCES OF INFORMATION 

) use A nay Regulations 
) uss AR 1-5 

) use TU 12-250 (Administration) 

) use TM 12-255 (Administrative Procedures) 

) use Virtue's "Company Administration" 

) use FU 21-6 (Index to Training Publications) 
) uss War Department Circulars 
) use TM 12-252 (Army Clerk) 

COMPANY RECORDS 

) make entries in the Morning Report 
) check entriee in the Morning Report 
) make aitries in the daily Sick Report 
) maintain Duty Rosters 
) write Company Orders 



( ) make entries in Service Records 

( ) make entries in Soldier’s Qualification Card 

( ) make entries in Officer' e Qualification Card 

( ) prepare furlough forms 

( ) prepare locator cards 

( ) check rosters 

( ) prepare extract copy of Morning Report 

( ) prepare descriptive list of Absentee 

Wanted by U. S. Army 
( ) prepare Bnergency Addressee Cards 

( ) prepare discharge certificates 

( ) prepare Certificate of Service 

( ) prepare Report of Separation 

( ) prepare Extract of Service Record 

( ) draft Special Orders 

( ) prspare Notification of Discharge 

J». MILITARY DISCIPLINE 

( ) use Manual for Courts— Martial 

( ) prepare Charge Sheets 

( ) prspare Court-Martial Orders 



5 . CORRESPONDENCE AND FILING 

( ) type military correspondence 

( ) type indorsements 

( ) prepare lnclosures (letter) 

( ) use military abbreviation s_ 

( ) prepare messagefonns 

( ) use company correspondence file 

( ) use War Departme r .t decimal file 

( ) use poli,cy fils 

( ) use suspense file 

( ) use 201 file 

( ) distribute incoming mail 

( ) prepare outgoing mail 

6. MACHINE OPERATION 

( ) operate an adding nmchine 

( ) operate a ditto machine 

( ) operate a mimeograph machine 

( ) cut stencils 

( ) Use etylus 

( ) type by touch method 

( ) type by "hunt and peck" 

7. SUPPLY 

( ) prepare Statements of Charg* 

( ) prepare Reports of Survey 

( ) prepare Property Issue Slips 

8. FINANCE 

( ) prepare pay rolls 

( J prepare officers' vouchers 

( ) prepare allotment forms 

( ) prepare Application for 

Dependency Benefits 
( ) prepare final statements 

( ) prepare insurance applications 

( ) prspare Soldier's Deposit Books 



B Give below any important information which has not been covered and which you feel is necessary for 
B * flompfeU picture° of your clerical and administrative background. 



TD AGO PRT-245 
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tion “Are you afraid of lightning and the dark?” 
is a case in point. The individual who is filled with 
terror by a flash of lightning might have no fear of 
darkness and so he is unable to answer truthfully 
except in the negative. If both aspects of the 
question are important, they should be asked sepa- 
rately. 

(2) The leading question. This is a question 
which suggests its own answer. A very obvious 
type of leading question is the “you are, aren’t you” 
type. “You are often uneasy when speaking before 
a group, aren’t you?” strongly suggests an answer 
in the affirmative. 

b. In the written questionnaire, the items should 
be worded in such a way that they require a mini- 
mum of writing on the part of the person answering 
them. This serves not only to make the question- 
naire easier to fill out, but also insures greater ease 
in “scoring” answers and reduces the element of 
subjectivity. When possible, the questions are so 
phrased that they can be answered with a check 
mark, as in the example. 

(1) Does it bother you to have 
someone watch you at work, even 
though you know you can do it well? 

With questions of this sort, of course, separate an- 
swer sheets can be employed and the answers 
“scored” by machine. The item may, also, be 
phrased as a statement, instead of a question, and 
checked as true or false depending upon whether 
or not it applies to the person answering it : 

T F 

(2) It bothers me to have some- (2) j I j j 

one watch me at work, even though i : i i 

I know I can do it well. 

(3) I do not try to correct people (3) j j j j 

who express an iguorant belief. 

89. Preliminary Trial of Questionnaire 

After the questions have been collected or construct- 
ed, the questionnaire should be thoroughly tested 
in operation. It should be given to groups repre- 
sentative of those for whom it is intended, to dis- 
cover whether the questions are worded clearly, 
unequivocally and in language that is not too diffi- 
cult. Its value as a classification instrument should 
be checked, in much the same way as a test is vali- 
dated. (See ch. 3.) The Clerical Experience 
Check List (fig. 30), for example, might be given to 
a group of successful company clerks and to a ran- 
dom selection of soldiers. Analysis of the check 
marks made by both of these groups could then re- 



veal "whether there were any items checked as fre- 
quently by the group of nonclerical soldiers as by 
the clerks. Such items would be of no discrimi- 
native value, and so could be deleted from the list. 
On the basis of such preliminary trials and analysis, 
the questionnaire might be revised, some questions 
omitted and others reworded. 

90. Scoring Questionnaire 

The results obtained with a questionnaire may be 
handled in several ways. Since the answers indi- 
cate information about the individual, it follows 
that this information should be treated in the 
fashion required to make it an effective aid in as- 
signment. Such data as age, schooling, occupa- 
tional experience, etc., may be useful in themselves. 
Sometimes each item can be assigned a weight and 
entered into a predictive formula — so many points 
for successful completion of a specialist course, and 
so on. With check lists, such as that illustrated 
in figure 30, a simple summation of the check marks 
entered may be considered as a “score.” It may 
be more important to know what duties the soldier 
has performed rather than how many, in which case, 
a study of the list is required. Personality ques- 
tionnaires, or inventories, are often “scored” for 
several different traits; a certain answer to a given 
question may count so many points on the “ag- 
gression” scale, for example, and have some other 
value on another scale. In general, it can be said 
that the way in which the questionnaire is to be 
used will determine the method of handling its 
results. 

Section IV. INTERVIEW 

91. Definition 

An interview is a conversation with a purpose. It 
is directed along certain channels as predetermined 
by the person conducting interview. It is essen- 
tially a controlled conversation but the amount of 
control may vary over a -wide range. In its most 
restricted form it may be of the formal question 
and answer type where it is limited to a series of 
specific topics. At the other extreme, the inter- 
view may have all the appearances of a thoroughly 
free and informal chat touching upon many topics 
according to the fancy of the participants. In a 
skilled interview, however, this appearance of com- 
plete indirection will be only superficial. The in- 
terviewer while purposely creating an informal set- 
ting and permitting the conversation to run freely 



Yes ? No 
□ □□ 
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will, by means of questions and comment, imper- 
ceptibly guide the discussion along the desired 
channels. Both extremes of the interview have 
their place in good personnel procedures. Where 
detailed specific items of information are desired, 
the formal question and answer type of interview 
serves quite adequately. The oral questionnaire, 
as described in section III of this chapter, represents 
the extreme in this form of interview. As a clinical 
and counseling device, the free and informal dis- 
cussion serves more adequately. Most interviews 
are intermediate between these two extremes. Both 
question and answer techniques and guided inform- 
al conversation are employed. Such is the classi- 
fication interview as conducted in the reception 
centers. 

92. Purpose of Interview 

The interview brings two persons face-to-face and 
permits information to pass between them in both 
directions. This permits personalities to be brought 
into play with the result that a number of purposes 
may be accomplished. The accomplishment of 
some of these purposes is possible only because of 
this interplay of personalities ; and results which 
could not be obtained with the use of impersonal 
techniques are thus made possible. For the most 
part the purposes accomplished by the interview 
may be grouped under three headings. 

a. To Obtain Information. The interview 
i6 a valuable, and sometimes the only, means of 
collecting information from the subject. Its value 
in this respect varies with the kind of information 
desired. When it is used as a method of revealing 
facts about skills and abilities, every effort should 
be made to verify and expand the information ob- 
tained by means of more objective techniques. As 
a means of col ecting such data as age, education, 
date of birth, etc., the interview can be as reliable 
a measure as are test techniques and objective 
questionnaires. One of its greatest values, and 
one where it has advantage over tests and objective 
questionnaires, is in the realm of personality. 
When undertaken by one competent in the tech- 
nique of interviewing, it can cut through attitudinal 
and emotional knots to the motivational core of a 
maladjustment in a manner impossible with the 
more impersonal instruments. 

b. To Give Information. As a means of ori- 
enting the soldier, instructing him or offering ad- 
vice, the interview is superior to the class or to the 
lecture since it allows for a presentation of ideas or 



counsel tailored to individual dimensions. Al- 
though the interview is more time consuming than 
the other methods, the time is considered well spent 
in terms of the satisfaction which the person inter- 
viewed expresses because of the personal attention 
given him. The separation counseling interview as 
used in the Army with its aim of disseminating vo- 
cational information and guidance is an excellent 
example of this type of interview. It is well to re- 
member, however, that any successful interview is 
informative to both parties and an interview at the 
reception center should not only result in the de- 
sired information for the interviewer but must satis- 
factorily answer the questions in the mind of the 
interviewee. 

c. To Boost Morale. Here again the inter- 
view is an excellent device. It is immeasurably 
easier in the face-to-face situation to influence at- 
titudes and emotions than it is by a formal lecture. 
Any unit commander maintains the spirit and mo- 
rale of his organization largely by the personal con- 
tacts he and his assistants maintain with the indi- 
viduals composing the organization. 

93. Reliability of Interview 

Caution must be exercised in forming judgments 
from information orbtained in interviews. Sources 
of error are inherent in the interviewer, in the per- 
son being interviewed, and in the relations between 
the two. For example, failure on the part of the 
interviewer to formulate the problem in such terms 
that the interview can contribute to its solution, 
may result in the collection of data not at all perti- 
nent to the problem at hand. Reliability of data 
is always limited to the interviewee’s knowledge, 
his memory, his attitudes, bis ability to observe, 
by his understanding of what is wanted and by his 
verbal capacity for clear and accurate expression 
of what he knows. 

94. Value of Interview 

While the interview has its weaknesses and limita- 
tions, it has great value as a classification technique. 
Used in conjunction with more objective methods, 
the interview affords the opportunity to integrate 
fragmentary and disjointed information and to 
reconcile apparently contradictory data. While 
subjective in character, it serves to uncover sources 
of information which may be investigated further 
by the more objective methods. The ultimate use 
of all information whether obtained through inter- 
view or by objective techniques is to obtain an over- 
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all picture of the individual. In the hands of the 
qualified interviewer, the interview technique fur- 
nishes valuable information on which such judg- 
ments may be based. 

95. Qualifications of Army Interviewers 

(See par. 27, TM 12^f25.) The value of the inter- 
view varies directly with the skill of the interviewer 
and it is at its best only when conducted by one 
competent in the use of this tool. Expert inter- 
viewers are hard to find but the Army has gone far 
in locating such personnel and in increasing their 
proficiency through training. Some of the more 
important qualifications of good interviewers are 
listed. 

a. General intelligence well above the average. 

b. A definite interest in people. 

c. A willingness and ability to take the point of 
view of the other person. 

d. The ability to evaluate and to discount one’s 
own prejudices. 

e. The art of listening as well as the art of 
conversing. 

/. The art of gaining the confidence of the other 
person. 

g. The ability to convince the other person of a 
genuine desire to be helpful. 

h. The ability to exercise tactful control of the 
interview without a domineering attitude. 

i. A skill in observing and evaluating behavior. 

j. A well-practiced skill in separating fact from 
inference. 

k. A broad knowledge of civilian and Army oc- 
cupations and requirements. 

l. A good military bearing and a frank, straight- 
forward manner. 



96. Conduct of Interview 

There is no fixed formula for the interview. It will 
vary according to its purpose and according to the 
personalities of the two participants. It must be 
flexible in form and readily adaptable to personnel 
requirements and idiosyncrasies. Yet there are 
several principles that should apply in most in- 
stances. Many of these points have already been 
outlined in connection with the individual testing 
session (ch. 4); that section should be carefully re- 
viewed. A brief enumeration of the salient features 
of the successful interview is here given as a further 
guide. 

a. The physical setting of the interview is im- 
portant. A quiet room, or at least a separate booth, 
will provide a measure of privacy that is essential. 

b. The interviewer’s preparation for the inter- 
view should include a definition of its general pur- 
pose, a schedule of points to be covered, and a thor- 
ough review of all intormation already available 
concerning the soldier, including all data obtained 
through tests, questionnaires, and rating scales. 

c. The orientation of the subject should start with 
an effort to create a friendly atmosphere, to put 
him at his ease, and should include a straight- 
forward statement on the purpose of the interview. 

d. The tempo of the interview should be adj usted 
to the needs of the soldier. He should be given 
plenty of time to answer in his own way, to amplify 
and elaborate, without feeling hurried. 

e. The responses of the person being interviewed 
and any pertinent observations of the interviewer 
should be recorded in some form during the session. 
Brief notes, symbols, marks on previously prepared 
check lists— any or all of these methods should be 
used in order that a complete report of the inter- 
view may be prepared. 
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USE 



CHAPTER 7 

AUTHORIZED ARMY TESTS— 

IN INDUCTION STATIONS, RECEPTION CENTERS, 
AND OTHER ARMY INSTALLATIONS 



Section I. PSYCHOLOGICAL TESTING IN 
INDUCTION STATION 

97. General 

Since it cannot be expected that every civilian pos- 
sesses the physical, mental, and moral characteris- 
tics essential to the effective soldier, it is necessary 
to exercise care in the selection of candidates in 
order to reduce to a minimum the number of fail- 
ures in training and in service. Laxity in the stand- 
ards of acceptance would result m the accumulation 
of unsatisfactory soldier material and increase both 
the time and money costs of training. It is m the 
interests of both economy and efficiency that the 
Army has set certain minimum standards for in- 
duction. Moreover, the data gathered during the 
screening process is an invaluable aid to 
throughout the career of every soldier this sec- 
tion deals with the specific tests, methods, and pro- 
cedures of psychological screening employed m in- 
duction stations. 

98. Induction Standards 

The standards for induction into the armed forces 
have been specified by the War and Navy Depart- 
ments. They may be classified into three 

a 8 Administrative, including age, citizenship, 

and character. ...... 

b. Medical (physical and neuropsychiatnc,). 
c Psychological or intelligence. 

Standards in the first category are prescribed by 
AR 615-500 Physical and neuropsychiatnc stand- 
ards are outlined in MR 1-9. Minimum intelli- 
gence standards are defined by section XXII of 
MR 1-9 in terms of educational achievement and 
passing scores on authorized “objective tests o 
intelligence.” The tests to be used, and the scores 
on these tests to be considered as passmg, are spec- 
ified by directive. 

99 Procedures at Induction Stations 

The procedure is essentially one of screening men 
according to the three categories of standards listed 
in paragraph 98. Each requires its own methods 



and specially trained personnel. Moreover, eae 
part of this screening process provides data of value 
to all who will subsequently assign the inductee o 
any duty or deal with him in any responsible, au- 
thorized, and military fashion whatsoever. 

100. Psychological Standards 

According to paragraph 100, MR 1-9: “Individu- 
als who are graduates of standard English-speaking 
high schools are acceptable. Individuals who are 
not graduates of standard English-speaking higi 
schools will be given prescribed objective tests of 
intelligence. A man achieving the critical score or 
a higher score on one or more of the authorized 
tests is acceptable for induction.” _ The psycho- 
logical screening process designed to implement this 
regulation is composed of two main parts— a pre- 
liminary screening interview and a senes or battery 
of objective tests. 

101. Preliminary Screening Interview 

It is the primary purpose of this interview to de- 
termine whether or not the selectee is a graduate of 
a “standard English-speaking high school Con- 
sequently, it is a brief routine interview of the fact- 
finding type. Satisfactory evidence of academic 
achievement may be obtained from the DSS Form 
No. 221, from a transcript of the high school record, 
certificate, diplopia, or even from a written or ver- 
bal statement. The presentation of such satis- 
factory evidence of graduation from a standard 
English-speaking high school” is _ a sufficient basis 
for a recommendation of inductibility , insofar as 
mental qualifications are concerned. The lack of 
such evidence, however, may not be considered 
grounds for rejection. No man is rejected for fail- 
ure to meet the psychological or intelligence stand- 
ards until he has demonstrated his inability^ to 
qualify for induction on the basis of the prescribed 
psychological tests. 

102. Psychological Testing Program 

a. Selectees Tested. Only selectees who have 
not completed the required course of study in a 
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standard English-speaking high school are referred 
to the psychological testing section of the induction 
station. There they are subjected to one or more 
of a series of tests designed to determine their ac- 
ceptability for induction. The tests currently au- 
thorized for this use are: 

(1) The Qualification Test (Q-l or Q-2). 

(2) The Group Target Test (GT-1). 

(3) The Individual Examination (IE-1). 

(4) The Nonlanguage Individual Examination 
(NIE-1). 

These tests were designed specifically for induction 
station use and are restricted to installations of this 
type. 

b. Testing Problem. The fundamental aim 
of the induction station psychological tests is to in- 
sure that non-high-school graduates among selectees 
accepted for induction possess the minimum mental 
capacity to absorb military training and become 
useful or satisfactory soldiers. Non-high-school 
graduates include men who are literate and have the 
necessary ability to learn, but whose formal educa- 
tion has been cut short for one reason or another. 
Also included are men who have gone to school in 
a foreign land, men who are illiterate because of 
lack of educational opportunities, and men whose 
intellectual capacity is seriously limited. Those in 
the last category would constitute a definite detri- 
ment to the Army if inducted. The non-English 
and illiterate may be useful if they can obtain a 
sufficient mastery of the simple verbal and numeri- 
cal skills that are basic to satisfactory completion 
of military training. Non-high-school graduates 
thus fall into three general categories: 

(1) Those possessing sufficient capacity to learn 
and sufficient literacy to undertake basic training. 

(2) Those possessing sufficient capacity to learn, 
but requiring further literacy training before under- 
taking basic training. 

(3) Those whose capacity to learn is inadequate. 

c. Induction Station Testing Battery. The 

task of the induction station testing program is to 
identify and to separate these three groups. Since 
the men to be classified fall into more than two 
groups, no single testing instrument can do the job. 
A battery of tests must be used, each performing a 
further step in the screening process. 

(1) Qualification Test ( Q-l and Q-%). This 
test is a quick screening device to identify those 



men who belong in the first category — that is, men 
whose capacity to learn and whose literacy level 
are both sufficient to recommend immediate assign- 
ment to basic training. The test consists of 17 
carefully selected items of the free-answer type, 
dealing with such matters as paragraph reading and 
comprehension, simple arithmetic, and what might 
be called general intellectual functions (such as the 
understanding of directions, relations, etc.). Some 
idea of the difficulty of the items can be gained from 
the fact that the average score of non-high school 
graduates is approximately 12. The test may be 
administered to groups of any convenient size; 
scoring is relatively simple and rapid. To allow for 
retesting, two alternate forms of the test are cur- 
rently authorized, Q-l and Q-2. They were stand- 
ardized by administering them to some 3,300 non- 
high-school graduates in induction stations through- 
out the country. The distributions of scores for 
the two forms, or the percentage of either group re- 
ceiving any given score, are very nearly identical, 
and for all practical purposes the two forms may be 
used interchangeably. The Qualification Test dif- 
ferentiates among those men who rank low in gen- 
eral learning ability as compared with all Army 
men. Since its content is all verbal or numerical 
and the examinees are required to read, scores on 
the test closely parallel those obtained with instru- 
ments which measure literacy. A very high per- 
centage of those who pass the test are both suffi- 
ciently literate and sufficiently equipped with mental 
capacity to complete basic military training in a 
satisfactory manner. Consequently, all who pass 
this test are classified as inductible and qualified 
for regular training assignment.* 

(2) Group Target Test (GT-1). Qualification 
Test failures include persons in both the second and 
third categories, paragraph 1026(2) and (3). Sep- 
arating them is the job of the remaining tests in the 
induction station testing program. Since all these 
men will have been tried (on the Qualification Test) 
and found wanting in verbal faci. lty, these remain- 
ing tests plumb the depths of mental endowment 
without recourse to printed question or written 
answer. They are, therefore, performance tests re- 
quiring a minimum of verbal ability. The first of 
these is the Group Target Test. As the name im- 
plies, it is a group test. The examiner gives certain 
simple verbal directions and with a pointer indicates 

raw scores are not converted to a standard score scale. The 
subject to change up or down in accordance with 



■The Qualification Test is one of the few Army tests for which . 

rifle score which is considered as passing is stipulated by directive, and 1B . . „ 

; C m^ fi powerrequirementa of the armed forces. As of present date, the cnfcal or passmg score u 9. 
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F igure 52. The qualification test differentiates among men who 
rank low in general learning ability. 

certain movements on the wall chart. The exam- 
inee gives his responses by drawing lines connecting 
appropriate dots in the “pictures” on his record 
sheet. Each of the 28 “pictures” is for a different 
problem or item. These are of three types: per- 
ception and memory of patterns, pattern orienta- 
tion, and directional orientation. The raw score, 
or number of correct responses, is converted into a 
standard score based on the performance of the 
standardization population — a large representative 
sampling of men from the lower half of the distri- 
bution of Qualification Test scores in induction sta- 
tions serving widely different geographical and cul- 
tural areas of the country. Unlike the usual Army 
standard score scale (see ch. 5) the scale for induc- 
tion station tests runs from 0 to 60 with an average 
of 30 and standard deviation of 10. 

(3) Individual Examination (IE-1). For the 
bulk of the Group Target Test failures, the Indi- 
vidual Examination is the final hurdle. As its 
name implies, it is an individual test. In view of 
its position in the program, it requires neither read- 
ing nor comprehension of verbal instructions other 
than the simplest. The test is composed of two 
main parts. Part I gets at the examinee’s ability 
to adapt his timing and coordination to a specified 
pattern by six “marching” problems of increasing 
difficulty. He is required to “march” up pathways 
of circles and lines, with the two hands alternately, 
scoring “hits” in the left- and right-hand pathways, 
while keeping in time with a 1 20-per-minute cadence 
counted out by the examiner. The score on each 
problem is the number of successful steps or “hits” 
in the left-hand pathway before an error is made. 



The score for the whole of Part I is the sum of the 
scores on the six problems. Part II consists of nine 
brief problems involving such tasks as the reproduc- 
tion of patterns with blocks, the arrangement of 
figure sequences, the memory for designs, and sim- 
ple directions. The score for the whole examina- 
tion is expressed in standard score form in the same 
manner as the Group Target Test scores. 

(4) Nonlanguage Individual Examination ( NIE - 
1). Men of foreign extraction who are not literate 
in English may be unable to answer any of the items 
of the Qualification Test, and if their understanding 
of spoken English is limited, may also be handi- 
capped on the Group Target Test and the Individual 
Examination. Such men may be educated in their 
native tongues, and given the minimum essentials 
of English verbal facility, may become valuable 
soldiers. The Nonlanguage Individual Examina- 
tion is employed with those men who fall in this 
category. It is administered to men who have 
scored zero on the Qualification Test and have sub- 
sequently failed the Group Target Test, when, in 
the opinion of the induction station personnel con- 
sultant, they may be classified as non-English- 
speaking. The content of the test is completely 
nonverbal and is administered by means 'of panto- 
mime, gestures, and demonstrations. Oral instruc- 
tions, even in the examinee’s native tongue, are not 
allowed. A typical page of the test contains two 
series of pictures, figures, or designs, one series at 
at the top of the page and one at the bottom. The 
examinee merely draw's lines on the page connecting 
like or similar pictures in the two series. The prob- 
lem, of course, lies in the detection of the resem- 
blance, and this varies from identity (as for ex- 
ample, a tank in each series) through similarity of 
function to the rather obscure likeness of a pair of 
designs. The raw score is the number of correct 
pairings, and this is converted into a standard 
score in the same manner as the other tests in the 
battery. 

103. Recording Scores 

The scores received on each of the tests in the induc- 
tion station series are recorded on the work sheet 
which accompanies each man through the entire 
screening process, and on the DSS Form 221. If 
the individual is accepted for induction the latter 
form will go with him to the reception center, where 
the scores will be transcribed to his WD AGO Form 
20, giving the official abbreviation of the test and 
the score. For the Qualification Test, the raw score 
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is recorded, while for the remaining tests the con- 
version into standard score form is used, as shown in 
the example below: 



(18) — (L)Other testa 


Te«t 


Grade — Score 


Q-l 


7 


GT-l 


25 


IE-1 


29 


NIE-l 


30 



104. Summary of Induction Station Test 
Procedures 

a. Men who have not graduated from standard 
English-speaking high schools are administered the 
Qualification Test and, according to their perform- 
ance, are separated into three classes: 

(1) Those whose score is 9 or above. 

(2) Those scoring 1 through 8. 

(3) Those scoring zero. 

b. The first class is considered inductible and 
ready to embark upon regular basic training. The 
second and third classes are given the Group Target 
Test. Those in the second class who pass the Group 
Target Test, or, failing this, pass the Individual 
Examination, are classified as acceptable for in- 
duction, but are qualified for Special Training Unit 
assignment. There is one modification of this gen- 
eral rule — men in this class who score below 15 
(standard score) on the Group Target Test may be 
rejected, at the discretion of the personnel consult- 
ant, without attempting the Individual Examina- 
tion. Those in the third class (Qualification Test 
scores of zero) who pass the Group Target Test are 
likewise acceptable for induction and are quali- 
fied for Special Training Unit assignment. If a 
man who understands the English language fails, 
he is considered noninductible and is rejected follow- 
ing a terminal interview. If a man in this third 
class fails the Group Target Test, and it is deter- 
mined that he does not understand English, he will 
be given an opportunity to demonstrate his ability 
on the Nonlanguage Individual Examination. A 
passing score on this test will qualify him for in- 
duction and for Special Training Unit assignment, 
regardless of his performance on the previous tests. 
A score below passing will be cause for rejection. 



105. Validity of Induction Station Tests 

The aim of the induction station testing program 
is to select and qualify for induction those among the 
non-high-school graduates who are likely to be- 
come satisfactory soldiers, and to identify and elim- 
inate those who threaten to become detrimental 
burdens to an efficient Army team. That the pro- 
gram accomplished this end was effectively demon- 
strated through the extensive research that led to 
the development and selection of the battery of tests 
employed. More men rated on performance in 
basic infantry training as satisfactory soldiers and 
•fewer rated as unsatisfactory are accepted by the 
foregoing selection procedure than would be the 
case if selection were made without tests or on the 
basis of procedures previously employed for the 
same purpose. Fewer satisfactory and more un- 
satisfactory soldiers are rejected. 

106. Tests Previously Authorized for Psycho- 
logical Screening at Induction Stations 

o. The testing program described above was au- 
thorized and instituted as of 1 June 1944. Previous 
to this time, since September 1942, other tests were 
employed for the same purpose. Since many men 
now in the Army were selected by those other tests, 
and their scores entered into Army records, a brief 
description of each of these earlier instruments will 
be given here. They include: 

(1) The Qualification Test; 

(2) The Visual Classification Test. 

(3) The Individual Battery: Wells’ Concrete 
Directions Test and the Block Counting Test. 

b. The Qualification Test is the same as that 
currently used and described previously. In the 
earlier program, however, the critical score was 7. 
Men reaching or exceeding this score were classified 
as “literate — inductible,” and were consequently 
qualified for regular assignment. Men scoring 6 or 
below were classified as “illiterate”; if they passed 
one of the subsequent tests they were acceptable 
but qualified for initial Special Training Unit 
assignment. If they failed all subsequent tests, 
they were considered non-inductible. 

c. The Visual Classification Test was designed 
as a group test of general learning ability requiring 
no language understanding either to comprehend 
the instructions or to indicate the responses. Its 
content was completely pictorial. Each item con- 
sisted of five illustrations of objects; four of these 
belonged to the same class or category, the fifth 
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being different. By means of pantomime and 
simple oral directions, the examinee was directed 
to identify and cross out that picture in each series 
which did not logically belong in the same category 
with the remaining four. 

d. Individual Tests were designed for men who 
failed both the Qualification Test and the Visual 
Classification Test. These men were put through 
a final screening with the two individual tests— the 
Wells’ Concrete Directions Test and the Block 
Counting Test. Both of these tests were of the 
performance type, the former involving the re- 
lational manipulation of some common tools and 
objects, the latter requiring the correct count of 
the blocks contained in a pictured aggregation. 

e. Table II summarizes some of the pertinent 
data on these earlier tests. They are presented 
here as an aid to the evaluation and interpretation 
of scores which wall be frequently found in the 
records of men throughout the Army. 

107. Terminal Interview 

Before any selectee is rejected for failure to meet 
the minimum psychological standards for in- 
duction, he is given a final brief interview. During 
this interview an attempt is made to discover and 
account for any discrepancies between test per- 
formance and previous personal history. Low test 
scores are inconsistent with adequate educational 
or occupational experience and may suggest faulty 
test administration, misinterpretation of instruc- 
tions, lack of proper motivation, or deliberate 
malingering. Even clerical errors committed in 
scoring, adding, or recording results may on occasion 
account for startling inconsistencies. Whenever 
the interviewer has reason to believe that the test 
score does not accurately reflect the selectee’s 



true ability, he may recommend a reexamination 
on any of the tests in the series. Authorization 
to re- test should never he abused, however. The 
purpose of reexamination is not to boost induction 
rates at the expense of Army efficiency, but to insure 
the precision of selection procedure. 

Section II. PSYCHOLOGICAL TESTING IN 
RECEPTION CENTER 

108. General 

When the selectee has been examined and found 
to be a suitable prospect for military service, he 
moves on to the reception center for equipment, 
orientation, classification, and assignment. Per- 
sonal history data are collected and systematized; 
mental abilities and aptitudes are determined. All 
of this information is recorded, coded, and punched 
on the WD AGO Form 20, Soldier’s Qualification 
Card. This card accompanies each man throughout 
his military career and serves as a continuous and 
cumulative record of his experience, training, and 
qualifications. 

109. Classification Interview 

One of the most important phases of reception 
center procedure is the individual interviewing of 
each incoming soldier. It is during this interview 
that the WD AGO Form 20 is initiated, and in view 
of the emphasis placed upon this document in all 
selections and assignments affecting the soldier 
throughout his Army service, it is evident that the 
interview is of vital importance to the Army and 
to the soldier himself. 

a. Soldier’s Qualification Card. The WD 
AGO Form 20 (Soldier’s Qualification Card) is the 
basic instrument of the Army classification system. 



Table II 



Important Statistics cif Earlier Induction Station Teats 



Name of Test 


Official 

Abbreviation 


Maximum 

Score 


Passing 

Score 


1. Qualification Test 


Q-l or Q-2 


17 


7 




VC-1 a 


50 


36 


3. Wells’ Concrete Directions* 


CD 


63 


52 


4. Block Counting Test* 


DST-10 


16 


12 



•Passing scores on both individual teste were required for acceptance. 
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It is designed to provide spaces for all information 
about the soldier that may be helpful in evaluating 
his past experience and in judging his potentialities 
for future development. It may be said that the 
most valuable resources of an Army are the skills, 
potentialities, and training experiences represented 
in the men who make up that Army. With the 
number of these men running into the millions, 
the value of the Soldier’s Qualification Card be- 
comes more evident. When a group is small, it is 
possible for those in command to become intimately 
acquainted with each of the men — to learn at first 
hand what the individual has done in the past, how 
well he executes his assignments, what particular 
skills he possesses, his capacity for leadership, and 
so on. If such a group remained the same, one 
could expect that proper selections and assign- 
ments would be made and all men used effectively 
even though the only personnel records were filed 
away in the mind of the commander. But the 
Army is not like that. Transfers at all stages and 
casualties in combat areas create a condition of 
continuous change among both officers and men. 
Replacements are often made from much larger 
groups where individual differences cannot be 
clearly observed. Every able officer should know 
his men, but when he leaves his command, he 
should pass that knowledge on to his successor. 
Otherwise the time and effort he has expended in 
training his men and in cataloguing their qualifi- 
cations and skiffs may be largely wasted. His 
knowledge that Corporal Smith is an efficient 
typist, or that Private Jones has learned, under his 
tutelage, the secrets of scouting and patrolling 
may give him some satisfaction. But unless this 
information is recorded and passed on, Corporal 
Smith’s new commander is quite likely to overlook 
him when picking a new company clerk, and Private 
Jones may be trained all over again for a different 
assignment. The WD AGO Form 20 is a record of 
experience and training which transfers knowledge 
of men. Through its proper use, any officer can 
know his men and employ them effectively even 
before seeing them. Corporal Smith’s card, for 
instance, will show his civilian experience as a 
typist, his degree of skiff, his performance on a 
tvping test, and any previous experience as com- 
pany clerk. Some of this information will be coded 
and punched in the margins of the card so that 
Corporal Smith, along with every other typist in 
the group, can be identified in a few minutes time. 
In a sense, the card is a talking picture of the men, 



showing what he is like and telling what he can do. 
Its value will depend on the extent to which it is 
an accurate and faithful reproduction. Its useful- 
ness will depend upon the intelligence with which 
responsible officers apply the information the card 
supplies. 

b. Purpose of This Interview. The purpose 
of the classification interview is to discover and 
enter upon the Soldier’s Qualification Card all in- 
formation about the man that will go toward making 
that record the most complete and accurate picture 
possible at this early stage of the soldier’s career. 
Such information will include his age, physical 
condition (Physical Profile Serial, reference, MR 
1-9 Supplement, 30 June 1945), education, foreign 
language proficiency, civilian occupational history, 
hobbies, previous military experience (ROTC, 
CMTC, CCC, etc.) and positions of leadership. 

It will also include scores he has made on certain 
authorized tests that are administered to all new 
soldiers. A detailed discussion of each of the items 
of the WD AGO Form 20 is presented in TM 12-425, 
which should be used at all times as a guide to the 
interviewing procedure. 

c. Importance of This Interview. A more 
general explanation of the nature of the interview 
and the selection and training of interviewers is 
contained in chapter 6 of the present manual. 
The material presented there should be carefully 
studied. The use to which the information will 
later be put should serve as a constant reminder of 
the importance of good interviewing. Some soldiers 
are not sufficiently impressed with the importance 
of the interview. They look upon it as a tedious 
but necessary routine, and consequently may not 
tell about some minor but valuable work experience. 
Some seem to feel that it is good Army practice to 
speak only when spoken to — to give only “yes” or 
“no” answers to direct questions. Others will 
embroider the most trivial facts into high-sounding 
statements intended to impress. The successful 
interviewer will create the proper impression to 
open the conversational floodgates. And he will 
seine the stream of talk, throwing back the small 
ones and saving the larger ones for the record. 

d. Oral Trade Questions (developed by United 
States Employment Service). The most significant 
and important single item of the WD AGO Form 20 
is that dealing with the soldier’s main civilian 
occupation (MCO). If the Army is to make use 
of all important civilian occupational experiences 
and skiffs of its men, it is essential that such occupa- 
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Figure 35. 



tional experience must be properly classified and 
coded, and that some indication of the soldier's 
proficiency in that field be recorded. Unfortunate- 
ly, it is not always possible to obtain such infor- 
mation by the simple procedure of asking the man. 
Because of the lack of uniformity in occupational 
titles, the differences in standards of proficiency, 
and, sometimes, the plain “cussedness” of human 
nature, the man’s own statement of his civilian 
job and his skill in it will not lead to accurate 
classification. The man who lays claim to the trade 
of bricklayer may, in reality, have been a hod carrier, 
a bricklayer’s helper, or a carpenter. Or he may 



have specialized in a narrow phase of the craft and 
be inexpert in other essentials. Consequently, it 
is desirable to check this man’s knowledge with 
reference to the trade or occupation in which he 
claims to be skilled. 

(1) The Oral Trade Questions are standardized 
sets of questions relating to a variety of skilled 
trades. In each set, the questions axe specific and 
require specific answers; the wording is simple and 
in the language of the worker. Each set contains 
approximately 15 questions carefully selected on 
the basis of their actual discrimination between 
groups of workers of known characteristics. Correct 
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or acceptable answers are all contained in the 
manual, and the score is simply the number of 
questions answered correctly. The extent to which 
they differentiate degrees of skill is illustrated in 
table III which show's the average (median) scores 
for three groups of workers on the questions for 
bricklayers. 



Table III 



Average Score for Three Groups of Workers on the Oral Trade Questions 
for Bricklayers 


Occupational group 


Average score 


Expert Bricklayers 


12 


Bricklayer Apprentices and Helpers 


5 


Workers in Related Fields 


1 



(2) In use, the Oral Trade Questions become a 
part of the classification interview. They should 5 
be prefaced by some such remark as “I’d like to ask 
you a few questions about the work you did before 
coming into the Army.” They should be read in a 
natural conversational tone, but they should never 
be altered or enlarged upon in any way. Scores 
are interpreted according to the norms accompany- 
ing each set of questions in the manual, and they 
should be recorded on the WD, AGO Form 20 in 
the manner specified by TM 12—425. 

110. Reception Center Testing Program 

Among the items of information that are recorded 
on the Soldier’s Qualification Card are the scores 
on all authorized tests which the soldier may take 
during his Army career. The first are scores of his 
induction station tests. The next are scores of 
tests given to all soldiers entering reception centers. 
All of these have widespread application through- 
out the soldier’s career and should, therefore, be 
administered as early as possible. Some of the 
scores of these tests are used immediately in mak- 
ing the initial assignments of the men. There is 
one exception to this general rule of testing all 
incoming men: those who have been accepted at 
the induction station and classified as “illiterate,” 
or qualified for Special Training Unit assignment, 
wall not be given the reception center tests until 
they have satisfactorily completed this course of 
special training in reading and arithmetic. 



111. Army General Classification Test 

a. The first tests to be administered in the 
reception center series are those which constitute 
the Army General Classification Test-3 (AGCT-3). 
As the name indicates, these tests measure abilities 
and aptitudes which underlie a large number of 
Army assignments. They are given at the re- 
ception center in order to obtain an early estimate 
of the assignment or training regime in which the 
soldier is likely to be most proficient. There are 
four tests in the battery,* each yielding a separate 
score of a separate important skill. A general 
over-all score is obtained by summation of these 
four separate scores. Performance on each of the 
four separate tests and total test performance are 
expressed in standard score form and in Army 
grades. (See ch. 5.) The total scores are com- 
parable to those obtained with the previous editions 
of the Army General Classification Test (see par. 
113a) which have been administered to over eight 
million men and which have found widespread and 
valuable employment in the selection of men for 
all kinds of Army specialist training and assign- 
ment. Thus, the total AGCT-3 score may be 
interpreted as a measure of general learning ability, 
or the factor commonly referred to as “general 
intelligence.” 

b . The major difference between the present and 
previous forms of the Army General Classification 
Test is the provision in the AGCT-3 for separate 
scores for each of the abilities or aptitudes measured 
in addition to the total over-all score. This differ- 
ence also constitutes the chief advantage of the 
present form of the test. While it is undoubtedly 
true that the composite measure of general learn- 
ing ability is basic to success in most Army training, 
Army training courses and assignments differ 
materially in the demands they make upon partic- 
ular skills. The company clerk, the automotive 
mechanic, and the fire control computer will all 
require a given measure of general intellectual 
capacity. But this general capacity, as the name 
implies, is compounded of a variety of specific skills. 
The man who scores high on tests of “general 
intelligence” is the one who knows a lot about a 
variety of topics. Yet he will usually be more 
skilled or better informed about some of these 
topics than others, and these peaks and valleys in 
his mental makeup should be taken into account 
in selecting him for a particular assignment. If 

of automotive mechanics and general 



♦Four noninformation tests are available at present. Other tests measuring knowledge 
shop work are still in the process of development. 
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his verbal or language facility is more highly de- 
veloped than his mechanical or mathematical 
facility, he should qualify for such assignments 
as clerk or instructor which place a premium on 
verbal ability. On the other hand, surveying and 
controlling the fire of coastal batteries often re- 
quire a mathematical ability of a higher order. 
It is, then, the primary advantage of the AGCT-3 
that it provides reliable and thorough measures of 
some of these specific abilities of which general 
ability to learn is composed. 

c. The four tests of the AGCT-3 are: Reading 
and Vocabulary; Arithmetic Computation; Arith- 
metic Reasoning; and Pattern Analysis. The 
nature and content of the first three of these is 
self-evident, and the usefulness of the abilities they 
measure (verbal and numerical) in predicting success 
in training is well established. The Pattern Analy- 
sis Test is an improved version of part II of the 
Mechanical Aptitude Test earlier included in the 
reception center testing program. It is. composed 
of line-drawn pictures of three dimensional objects 
accompanied by outline patterns from which these 
objects are formed. Various edges of the picture 
and lines of the pattern are marked, and it is the 
examinee’s problem to match up corresponding 
edges and lines. In other words, the test measures 
skill in the mental manipulation of spatial relations 
and the visualization of three-dimensional form. 
That such skills are related to or predictive of 
mechanical proficiency in the Army has been well 
established. One illustration will suffice. The 
present Pattern Analysis Test was administered to 
airplane mechanic students at the AAFTC, Keesler 
Field, Mississippi, and test scores compared with 
grades in the course. The extent to which scores 
on the test can be used to predict course grades 
is illustrated in the following table: 

Table IV 



Relation Retwo» n Pattern Analysis Test Score end Grades in Airplane 
Mechanics Courses 
(AAFTC, Keesler Field, Miss.) 



Men receiving a test score of: 


100 


110 


120 


Have these chances in 100 of reach- 








ing or exceeding the average grade 








in course: 


55 


63 


75 



From this table it is evident that the predictive 
value of the Pattern Analysis Test is high; men 
scoring 120 on the test, for example, stand a 3 to 1 
chance of being average or better students in the 
course. 



d. The Army General Classification Test-3 
constitutes a comprehensive and flexible set of 
measuring tools. Taken all together, they provide 
an accurate and valuable indication of general 
ability to learn in a wide variety of Army situations. 
Taken singly and in combination, they can serve 
as useful predictors of success in specific courses of 
training or specific assignments. The composite 
score finds immediate use in the assignment of 
men to the training centers of the various arms and 
services, and later, to service and combat units, 
since the advisability of building balanced units 
has been abundantly demonstrated. The specific 
scores, considered in conjunction with this over-all 
score, are most advantageously employed in the 
selection of men for specialist training. 

112. Army Radio Code Aptitude Test 
(ARC-1) 

One of the outstanding features of modern warfare 
is the extensive employment in all echelons of all 
arms and services of the most advanced technolog- 
ical improvements in such fields as transportation, 
engineering, ordnance, medicine, and communi- 
cation. A mobile Army, to maintain effective con- 
trol, must be coordinated by a complicated network 
of communications. More than ever before, radio 
is playing a major role in the conduct of hostilities. 
Most of this is voice radio, to be sure; nevertheless, 
the Army’s needs for radio code operators are larger 
than can be met by the selection of all experienced 
code operators who are inducted into the Army. 
Consequently, the training of radio code operators 
is an urgent necessity. Experience has demon- 
strated marked individual differences in aptitude 
for learning to receive code messages, and research 
has shown that this aptitude can be predetermined 
by means of suitable tests. Such a test is the Army 
Radio Code Aptitude Test (ARC-1), given to all 
men at reception centers for the purpose of predict- 
ing probable success in this kind of training. 

a. The Army Radio Code Aptitude Test (ARC- 
1 ) is presented by means of phonograph recordings. 

It involves the learning and recognition of the code 
signals for three specified letters of the alphabet. 
The test proper is preceded by a learning period 
during wffiich these code signals are sounded accom- 
panied by an announcement of the corresponding 
letter. In the test itself, the signals alone are pre- 
sented in random order, and the examinee must indi- 
cate which of the letters the signal represents. The 
first half of the items are presented at a speed equiv- 
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alent to approximately 11 words per minute, and 
the last half at approximately 15 words per minute. 
The score for the test is the number of correct recog- 
nitions minus one-half the number wrong, and this 
is converted -into standard score terms in the usual 
manner. 

b. That the Army Radio Code Aptitude Test 
is capable of predicting probable success in training 
to receive code signals has been amply demonstrat- 
ed. The following table (table V) illustrates the 
relationship between test scores and code-receiving 
speed in several classes after two months of training. 
This relationship is expressed in terms of the per- 
centage of men within certain score ranges on the 
test who had reached or exceeded a code-receiving 
speed of 12 words per minute (the average rate for 
this stage of training). Thus, of men scoring 130 
or above on the test, 87 percent had reached or ex- 
ceeded the average rate, after two months of train- 
ing, of 12 words per minute. Test scores within 
this range are, therefore, considered as superior, 
and anyone scoring this high stands a chance of 
better than 6 to 1 of making a satisfactory comple- 
tion of the course of training. On the other hand, 
persons scoring below 100 (and since 100 is the av- 
erage standard score, this means half of the Army) 
stand about the same chance (6 to 1) of poorer-thari- 
average performance in the course. 

113. Earlier Reception Center Tests 

The tests so far described as constituting the recep- 
tion center testing program were developed after 
months of research and experiment. But while 
these were being developed, other tests were being 
used, and their results will be found in the records 
of millions of men now in the Army . 

a. Army General Classification Test-1. 
Perhaps the most widely known of all Army tests 
is the Army General Classification Test-1 or AGCT- 
1. In all, four forms of the test were developed. 
The first two forms, AGCT-la and lb, after serv- 
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ing 'with approximately four million men, were su- 
perseded by AGCT-lc and Id. These editions in 
turn were used with the next four or five million 
men. Scores were widely used. They played a 
prominent part in certain assignments. They were 
used in the selection of men for specialist schools 
and for officer candidate training. Consequently, 
the test enjoyed wide publicity — and suffered ex- 
tensive misinterpretation, especially since it was 
widely — though wrongly — known as the “IQ Test.” 
The AGCT-1 was composed of three kinds of ques- 
tions — vocabulary, arithmetic, and “box counting.” 
Its major disadvantage was its inability to yield 
separate scores of these three abilities. Neverthe- 
less, it was of undoubted value as a thorough and 
reliable measure of general learning ability, and, as 
subsequent chapters will show, it produced results 
which were directly related to success in many dif- 
ferent kinds of courses of training. 

b. Mechanical Aptitude Test (MA). This 
test was designed to estimate chances for success in 
training for assignments of a mechanical nature. 
The test was in three parts, each scored separately. 
The part scores were converted and recorded as 
Army grades while the total score was expressed 
as a standard score as well as an Army grade. In 
the first form of the test (MA— 1) the three parts 
were: mechanical movements; surface develop- 

ment; and shop mathematics. This form was 
superseded by MA— 2 and MA— 3, each of which was 
composed of the following parts: mechanical in- 

formation; surface development; and mechanical 
comprehension. (A fourth form, MA-4, was de- 
signed' and authorized for use with the WAC.) In 
general, high scores on the test are indicative of an 
aptitude for training in technical or mechanical 
courses such as those for motor mechanics, artillery 
mechanics, or aircraft armorers. 

c. Radiotelegraph Operator Aptitude Test 
(ROA-1, X-l). This was the test used earlier to 
aid in the selection of radio code operators. A man 



Table V 

Prrf — ^ f'nrlH-Rpceivina Speed After Two Months of Training Compared with Scores on the ARC-1 


Interpretation 


Standard score range 


Percent receiving at the rate of 12 words per minute or better 


Superior 


130 and above 


87 


Satisfactory 


110-129 


70 


Low 


100-109 


| 48 


Unsatisfactory i Below 100 


32 
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who is able to differentiate code patterns which he 
hears will, in general, learn code more quickly than 
will a man who does not recognize such differences. 
He will, therefore, be a better risk for training as a 
radio code operator. The Radiotelegraph Operator 
Aptitude Test consisted of two consecutive adminis- 
trations of the old Signal Corps code aptitude test. 
It was presented by means of phonographic re- 
cordings. Each item consisted of two code pat- 
terns sounded in succession, and the examinee was 
required to decide whether the two patterns were 
the same or different. In general, men who have 
had previous experience with code, or men wdio play 
a musical instrument, tend to differentiate better 
and to make higher scores, and with such men, the 
predictive value of the test is apt to be high. With 
inexperienced men, where the real selection problem 
lies, the ROA will select fewer men than the ARC-1, 
and fewer of those selected will become satisfactory 
operators. Hence, its replacement by the improved 
test. 

Section IN. PSYCHOLOGICAL TESTING 
IN OTHER INSTALLATIONS 

1 14. Continuing Need for Testing 

The use of psychological tests is not limited to in- 
duction stations and reception centers but occurs in 
installations of all types and for many reasons. 
Personnel selection and initial assignment for train- 
ing obviously require the kinds of information about 
a soldier’s abilities which these tests furnish; but 
it is equally true that such information is valuable 
whenever decisions must be reached regarding 
changes in assignment or selection for more ad- 
vanced training. It is for these reasons that each 
soldier’s test scores are recorded along with his 
personal history data on WD AGO Form 20 (Sol- 
dier’s Qualification Card) which follows him wher- 
ever he goes until he is separated from the service. 
Although every effort is made to safeguard person- 
nel records there are instances both in the zone of 
interior and in theaters of operation when it is neces- 
sary to reconstruct records and to replace those 
which have been lost. Likewise, in the transfer of 
personnel, records are often delayed in transit with 
the result that information pertaining to test re- 
sults is not available at the time it is most needed. 



In these instances it is necessary to’ retest these per- 
sons in order that test scores may be available. 
Such testing may involve only the general classifi- 
cation test or it may include any number of special- 
ized tests which are described in other sections of 
this manual. 

a. The need for test information is likely to be 
acute wdien soldiers arrive at a training center, re- 
distribution center or a replacement depot. The 
large assortment of abilities represented must be 
distributed at once in such a manner that each requi- 
sitioning unit receives its due proportion of the avail- 
able abilities and skills. Likewise the maximum 
use must be made of the abilities of each man who 
is available for reassignment. Replacement units 
in overseas theaters make frequent use of the tests 
described elsewhere in this manual to aid in the 
proper assignment of hospital evacuees and other 
personnel received for retraining and reassignment. 

b. Sometimes there are reasons to believe that 
a man’s test score does not correctly represent his 
abilities: In order to verify or correct the recorded 
score, the soldier may be given the chance to show 
what he can do on an alternative form of the test. 
The regulations pertaining to officer candidates 
(AR 625-5) stipulate that “a score of 110 or higher 
in the Army General Classification Test is required.” 
Since other regulations permit a retest of individuals 
with the AGCT when there is reason to believe pre- 
vious scores are invalid, a readministration of the 
AGCT is sometimes warranted when an applicant 
for Officer Candidate School has demonstrated abil- 
ity beyond that which is implied by the score on an 
early- administration of the test. In all cases of re- 
testing the provisions of paragraph 18, TM 12-425 
will be followed. 

c. Clinical diagnosis frequently requires the use 
of tests in hospitals, disciplinary barracks, consulta- 
tion sections in training centers, and processing cen- 
ters. The results of these tests provide valuable aid 
to clinical psychologists and psychiatrists who are 
en g a g6d in curative and preventive programs of psy- 
chotherapy. Reconditioning and retraining analy- 
ses of skills and abilities and, in many instances, 
personnel measurements are used to assist in de- 
termining proper levels and types of placement for 
convalescing patients. These tests so employed 
are described elsewhere in detail. 
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CHAPTER 8 



AUTHORIZED ARMY TESTS-USE IN SELECTION AND ASSIGNMENT 



Section 1. ORIENTATION 

115. General 

The testing instruments that have been developed 
for use in connection with the mobilization and gen- 
eral classification of men in the Army have been 
discussed in previous chapters. The nature of the 
information obtained by means of these mstruments 
will be clarified in this chapter with particular rel- 
erence to the usefulness of the data m selecting men 
for specialist training and for assignment to the 
particular jobs specified in the tables of organiza- 
tion of the various units. Upon the completion of 
basic military training, each man in the Army is 
assigned to a unit, and eventually to a particular 
job in that unit. All of these jobs have been ana- 
lyzed and the duties and requirements described in 
TM 12-427. For many of them, no special prep- 
aration is necessary beyond that obtained in basic 
and unit training, in field exercises and maneuvers. 
For a large number of jobs, however, special train- 
ing in the training centers and schools of the vari- 
ous arms and services is required before the duties 
can be carried out effectively. It is with this latter 
group, for the most part, that the problems of se- 
lection and assignment have been most urgent. 

116 . Selection for Training 

When men are to be trained as specialists, it is ob- 
viously essential to insure that only those men are 
selected for this training who have a reasonable 
chance of completing the course satisfactorily, 
a large pool of qualified men is available, the se- 
lection problem is simplified. Previous experience 
in related fields, education, information, and in- 
terests, supplemented by tests of present skills, 
will reveal those most likely to be satisfactory. 
When few men clearly qualified on these bases are 
available, aptitude tests must be used to identi y 
the promising men who have no civilian experience 
in related activities. A number of such aptitude 
tests have been developed and used in the selection 
of men for specialist training. They are especial y 
important in evaluating the potential usefulness 
of younger men who have had little or no job ex- 
perience previous to induction. 



117. Selection for Direct Assignment or 
Reassignment 

The progress of any war is accompanied by changing 
requirements in techniques, materiel, and men 
skilled in their use. Victory stems from ability to 
adjust rapidly to these changes requiring shiftmg 
emphases on the various arms and services. Some 
specialties are reduced in importance; the need for 
others is augmented. Consequently, men must e 
reclassified, often retrained, and reassigned. In 
addition, large numbers of men are returned from 
the active theaters as casuals to be redistributed 
and, in many cases, to be retrained. In such cases 
the evaluation of military skills acquired through 
previous Army experience can save much time in 
selection and avoid much unnecessary retraining. 
Often it will be a matter of checking whether a man 
really knows the particular job in which he claims 
experience or in which he has been classified. At 
other times, it will be a problem of _ determining 
whether he has, by accident or by design, acquired 
sufficient skill to bypass all or part of the training 
for a particular job. Achievement tests which 
evaluate the present knowledge of the individual 
relative to the requirements of the job have been 
developed for a number of the numerically most 
important military occupational specialties (MUb). 
(See current issues of FM 21-6.) 

118. Value of Tests for Selection 

Tests are employed to select men for training or for 
particular assignments on the basis of a demon- 
strated relationship between performance on the 
tests and subsequent performance in training and / 
or on the job. The value of any test, then, will be 
a function of the extent of this relationship. Each 
Army test is constructed for a specific purpose, and 
is a valid instrument when used for that purpose. 
(See chs. 3 and 5.) Some tests are valuable for 
selecting men for training in particular specialties. 
Others, like the Army General Classification Test, 
are useful for predicting success in several kinds of 
training assignments in which general verbal and 
numerical skills are involved. Such tests have dif- 
ferent validities for each of the various assignments. 
Where the course calls for considerable exercise of 
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verbal and numerical abilities, the relationship be- 
tween performance on the AGCT and success in the 
course will be high. Where these same abilities 
are of limited importance, this relationship will be 
low, and the test of little value in selection for 
training. 

119. Expectancy Charts 

The extent of this relationship between test per- 
formance and subsequent success in training (i. e., 
the validity of the test for the specific purpose), can 
be illustrated by means of a chart showing what 
chances (in 100) a man making a particular score 
on a given test has of making an average-or-better 
grade in a particular training course. A number 
of such expectancy charts are presented in the re- 
maining sections of this chapter. By reference to 
these charts, the classification officer is able to es- 
timate the probabilities for success of a trainee with 
a given test score. According to the chart in para- 
graph 121a(l), for example, a soldier receiving a 
score of 140 on the Clerical Aptitude Test would 
have 84 chances in 100 of receiving an average-or- 
better grade in the clerical course. Or, of 100 such 
men receiving a score of 140 on the test, 84 can be 
expected to achieve average-or-better grades in the 
course. This, in general, is the meaning conveyed 
by the expectancy chart. There are several limit- 
ing factors, however, that must be considered in 
any correct interpretation of these figures. 

a . Ability of Those Assigned to Training 
Course. Since the predictions of success in train- 
ing state the probabilities of reaching or exceeding 
the average of the class, it follows that these proba- 
bilities will vary according to the ability level of 
the class. Where men are preselected for a course 
on the basis of education or test performance, it 
will usually follow that the general level of the class 
is higher. The average (mean) test score of the 
group studied is given in each of the charts. Ac- 
cording to the figures in paragraph 121 a(2), for ex- 
ample, the average Army General Classification 
Test score for the 2,947 clerical trainees studied was 
,121.7, which is well above the average for the Army 
as a whole (100). Consequently, one would have 
to score higher than 121.7 on the test in order to 
have a better-than-even chance for better-than- 
average grades in the course. As this chart shows, 
men receiving a score of 140 on the Army General 
Classification Test have 76 chances in 100 of ach ev- 
mg average-or-better grades in a clerical course in 
which the mean AGCT score of the class is 121.7. It 



should be noted that each chart includes, in addi- 
tion to the number of cases involved in the study 
(N) and the average test score of the group (Mean 
Score), also the standard deviation of test scores 
(S. D.) and the coefficient of correlation between 
test scores and course grades (r). 

b. Content of the Particular Course. Since 

the expectancy charts are based on a demonstrated 
re ationship between test scores and grades in the 
particular training course under discussion, it fol- 
lows that they wi 1 continue to serve as guides to 
prediction only so long as the content of the course 
remains comparable to that for which this demon- 
strated relationship was discovered. Where it is 
known that the content of the course has under- 
gone radical revision, the charts should be used with 
caution. It is further worth noting, in this respect, 
that the predictions embodied in the charts often 
relate to achievement in classroom work and not to 
those other factors, such as leadership, which may 
contribute to a successful conclusion of the course. 

c. Stability of Standards. Availability of 
men for particular types of training varies from time 
to time, and course standards are prone to adjust 
themselves to these changing demands for men. 
When many more men are available for training 
than are required as trained specialists, course 
standards are apt to become more rigorous. The 
opposite effect results from a decrease in the avail- 
ability-requirement ratio. Such shifts naturally af- 
fect the probabilities contained in the expectancy 
charts. As standards become more rigorous, the 
probabilities of success for any given test score be- 
comes less than the figure given in the expectancy 
chart. Common sense suggests, therefore, that as 
supply-demand ratios for specialists change, the 
critical test scores required for selection be adjusted 
accordingly. 

120. Plan of This Chapter 

I he remaining sections of this chapter list tests and 
present expectancy tables, where available, for 
many of the areas of training in the Army. For 
the most part, the various tests are considered with- 
in a framework of military occupational specialties 
(MOS) for which the test has been shown to be a 
useful instrument. Section II, for example, is con- 
cerned with assignments to training for jobs of ad- 
ministrative and clerical nature. It deals with a 
number of Army jobs in this area, considering for 
each the tests that are useful in selection for train- 
ing and those that are designed to implement se- 
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lection for direct assignment or reassignment. The 

particular specialties that are included are not to 
be considered as an exhaustive list or the only ones 
to which the tests are applicable. Rather, they 
should be thought of as being representative of the 
general areas or “job families” to which they be- 
long. An alphabetical listing of the more import- 
ant authorized Army tests, with references to the 
paragraphs in which they are discussed, is contained 
in the index to this manual. The responsible officer 
or enlisted personnel technician who desires in- 
formation on the availability of tests and their value 
in selecting men for certain training or job assign- 
ments may utilize the materials contained in this 
chapter ; and he may refer to the alphabetical listing 
to discover the main training assignments for which 
a given test has proved valuable. If these refer- 
ences contain no information or insufficient informa- 
tion, the selection problem, if of considerable im- 
portance, may be referred to The Adjutant General’s 
Office, Personnel Research Section. 

Section II. ADMINISTRATIVE AND 
CLERICAL ASSIGNMENTS 




121. Clerk General (MOS-055) 

a. Selection for Training. (1) Clerical 
Aptitude Test. 



SCORE 



140 


= 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



(2) Army General Classification Test. 

SCORE — 



140 


SIB 


■■■■■76 


120 


■■■ 


■147 


100 


s*20 


EJ 


is 


E3 


1 





Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 

(3) Other factors. There appears to be only a 
slight relationship between the age and education of 
trainees and success in clerk’s training. In a typical 
school, the average graduate is 23 years of age and 
has completed one year of college, while the aver- 
age “washout” is 21 years old and has completed 
high school. 

b. Selection for Direct Assignment. (1) 
The Clerical Experience Check List (TC-23ar) is 
useful for obtaining a general appraisal of the man’s 
experience in clerical work. The number and types 
of items checked by the average graduate of a cler- 
ical course are indicated in a supplement to Manual, 
Army Trade Screening Tests, TC-M12. 

(2) The Clerical Achievement Test (TC-24ar) 
is designed to measure the individual’s technical 
knowledge of Army forms and regulations. Scores 
on the test may be used to identify soldiers in this 
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military occupational specialty who require re- 
fresher training before reassignment. Critical 
scores and interpretations are given in Manual 
TC-M12 and supplements. 

122. Clerk-Typist (MOS-405) 

a . Selection for Training. (1) Clerical Ap- 
titude Test. 

(2) Army General Classification Test. 

b. Selection for Direct Assignment. (Jl) 
Clerical Experience Check List (TC-23ar). 

(2) Clerical Achievement Test (TC-24ar). 

(3) typing Tests are standard paragraphs used 
to determine the individual’s typing proficiency. 
Results are obtained in “net words per minute,” 
that is, speed minus errors, and should be inter- 
preted in terms of the standards of speed and ac- 
curacy required for the particular job for which 
assignment is considered. 

123. Stenographer (MOS-213) 

a . Selection for Training. (1) Clerical 
Aptitude Test. 

(2) Army General Classification Test. 

b. Selection for Direct Assignment. (1) 
Clerical Experience Check List (TC-23ar). 

(2) Clerical Achievement Test (TC-24ar). 

(3) Typing Tests (par. 122 6(3)). 

(4) Dictation Tests are standard paragraphs 
used to determine the individual’s proficiency in 
taking and transcribing shorthand. Scores of 65 
and above indicate satisfactory proficiency at the 
rate of 80 words per minute. Scores below 65 in- 
dicate unsatisfactory performance at that speed. 

(5) Other factors of value in predicting success- 
ful performance as stenographer are education and 
previous experience. Commercial school gradu- 
ates are likely to be better prepared m the technical 
phases of the assignment — i. e., grammar and 
punctuation of correspondence. Men with previous 
military experience in this field will probably be 
more acquainted with the forms of military cor- 
respondence than those whose training has been 
chiefly civilian. 

124. Supply Clerk (Quartermaster Supplies) 
CMOS-835) 

a. Selection for Training. (1) Clerical 
Aptitude Test. 

(2) Army General Classification Test. 

b . Selection for Direct Assignment. 

(1) The Supply Clerk Experience Check List 
(Quartermaster Supplies) (TC-29ar) is a con- 



venient instrument for securing a general appraisal 
of a man’s experience and background in supply and 
warehousing clerical work. The number and types 
of items checked by the average graduate of the 
Supply Clerk’s training course are indicated in a 
supplement to Manual, Army Trade Screening 
Tests, TC-M12. 

(2) Supply Clerk Test (Quartermaster Supplies) 
(TC-30ar) is designed to measure technical knowl- 
edge of the clerical phases of supply and ware- 
housing. Scores on the test may be used to identify 
soldiers in this military occupational specialty 
who require refresher training before reassignment. 
Critical scores and interpretations are given in 
Manual TC-M12 and supplements. 



Section III. AUTOMOTIVE AND 
MECHANICAL ASSIGNMENTS 




125. Truck Drive, Light (MOS-345) and 
Truck Driver, Heavy (MOS-931 ) 

Selection for Direct Assignment: 

a. The Truck Driver Experience Check List 
(TC-21ar) is designed to furnish a quick and 
objective estimate of a man’s experience in truck 
driving. The number and types of items checked 
by the average graduate of a motor vehicle oper- 
ation course are contained in the supplement to 
the Manual TC-M12. 

b. The Truck Driver Test (TC~-22ar) measures 
technical knowledge relating to the operation of 
Army motor vehicles. Scores on the test may be 
used to identify soldiers in this military occupa- 
tional specialty who require refresher training 
before reassignment. Critical scores and inter- 
pretations are given in Manual TC-M12 and 
supplements. 



1W ARMY library 



79 




r 



TM 12-260 

126 Mechanic, Automotive (Second Eche- 
lon) (MOS-014); Mechanic, Automo- 
tive, Wheel Vehicle (Third Echelon) 
(MOS-965); and Mechanic, Automo- 
tive Track Vehicle (Third Echelon) 
(M OS-966) 

a. Selection for Training. Mechanical 
Aptitude Test. 



SCORE 



140 

120 
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80 
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N Meon Score 1 SD r 

330 >05.0 1 17.2 .54 



Chances in 100 that a man receiving one of the above scores mil 
achieve average or better in training. 



b. Testing for Placement in Course. The 
Army Automotive Screening Test Battery was 
designed for use in placing men in the course E-70 
(Wheeled Vehicle Automotive Mechanics) at the 
Ordnance School. This course is composed of 3 
phases of 4 weeks each, and is basic to all further 
specialist courses in automotive mechanics. Phase 
I deals with 1st echelon maintenance, and can be 
omitted, with a profitable saving in training time 
and facilities, by men with adequate previous 
civilian or military experience in automotive 
maintenance. The Army Automotive Screening 
Test Battery is used to identify such men. There 

are 5 tests in the battery. 

(1) The Auto Mechanic Experience Check List 
(TC-13a) is used as an objective means of evaluat- 
ing past experience in auto mechanics’ work. 
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(2) Tool Usage Film Strip Test (TC-12a). 
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N Meon Score 
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2i.7 .62 



Chances in 100 that a man receiving one of the above scores will 
achieve average or better grades in Phase I of Course E-,0 
(Wheeled Vehicle Automotive Mechanics), Ordnance 
School. 



(3) Auto Mechanic Test (TC-Uar). 

SCORE 
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Chances in 100 that a man receiving one of the above scores unll 
achieve average or better grades in Phase I of Course E-70 
(Wheeled Vehicle Automotive Mechanics), Ordnance 
School. 
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(4) Distributor and Valves Test (TC-15a). 

SCORE 




Chances in 100 that a man receiving one of the above scores mil 
achieve average or better grades in Phase I of Course E-70 
(Wheeled Vehicle Automotive Mechanics), Ordnance 
School. 



(5) Use-of-Tools Test (TC-l6a). 

SCORE 




Chances in 100 that a man receiving one of the above scores will 
achieve average or better grades in Phase I of Course E~70 
( Wheeled Vehicle Automotive Mechanics), Ordnance 
School. 

c . Selection fob Direct Assignment. 
(1) Auto Mechanic Experience Check List 
(TC-13a). 

(2) Auto Mechanic Test (TC-14ar). 



127. Tank Mechanic, Minor Maintenance 
(M OS-660) 

Selection for Training: 
a. Mechanical Aptitude Test. 



SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 

b. Other factors of value in selecting men for 
training as tank mechanics are previous occupation 
and education. Men with civilian experience in 
automotive work stand a better chance than those 
without. Evidence also points to the desirability 
of 10 years of schooling. 

128. Airplane and Engine Mechanic (MOS- 
747) 

Selection for Training: 



a. Mechanical Aptitude Test. 

SCORE 




Chances tn 100 that a man receiving one of the above scores mil 
achieve average or better in training. 
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b. Army General Classification Test. 



SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or belter in training. 



c. Trade Information Test (TC-la). 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



/. Nut and Bolt Manual Dexterity Test (TC~5a) , 

SCORE 




Chhnces in 100 that a man receiving one of the above scores will 
nnarnnp or hatter in trainino. 



g. The U-Bolt Test (TC-6a). 

SCORE _____ 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



d. General Technical Test (TC 2a). 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 

e. Technical Trade Test (TC-7a) is composed 
of items selected from TC-la and TC-2a. 



129. Airplane Armorer (MOS-911) 

Selection for Training: 

a. Mechanical Aptitude Test. 



SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 
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b. Trade Information Test (TC-la). 

SCORE 
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Chances in lOO that a man receiving one of the above scores will 
achieve average or better in training. 
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' • N ut and Bolt Manual Dexterity Test (TC-5a) 

SCORE 



140 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



c. General Technical Test (TC-2a). 

SCORE 
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Meon Score 5D r 

1(7.2 14.8 .44 



Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



d. Technical Trade Test (TC-7a). 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



f. U-Bolt Assembly Test (TC-Ga). 

SCORE 




Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



130. Machinist (MOS-114) 

Selection for Direct Assignment: 

a. The Machinist Experience Check List 
(TC-17ar) provides a convenient method for se- 
curing a general appraisal of a man’s background 
in machine work. The number and types of items 
checked by. the average graduate of a machinist 
course are indicated in a supplement to Manual 
TC-M12. 

b. The Machinist Test (TC— 18ar) is designed 
to measure technical knowledge essential in ma- 
chine work. Scores on the test may be used to 
identify soldiers in this military occupational 
specialty who require refresher training before 
reassignment. Critical scores and interpretations 
are given in Manual TC-M12 and supplements. 
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131. Carpenter ( MOS— 050) 

Selection for Direct Assignment: 

а. The Carpenter Experience Check List 
(TC-25ar) is a device for securing an objective 
evaluation of experience and training in carpentry. 
The number and types of items checked by the 
average graduate of a carpentry course are indi- 
cated in Manual TC-M12 and supplements. 

б. The Carpenter Test (TC-26ar) is designed 
to measure technical knowledge in carpentry. 
Scores on the test may be used to identify soldiers 
in this military occupational specialty who re- 
quire refresher training before reassignment. Criti- 
cal scores and interpretations are given in Manual 
TC-M12 and supplements. 

132. Welder Combination (MOS-256) 

Selection for Direct Assignment: 

a. The Welding Experience Check List 
(TC-19ar) is of value in obtaining an objective 
evaluation of a man’s experience and training in 
acetylene and electric arc welding. The number 
and types of items checked by the average graduate 
of a welding course are indicated in Manual TC-M12 
and supplements. 

b. The Welding Test (TC-20ar) is designed to 
measure technical knowledge of operations involved 
in acetylene and electric arc welding. Scores on 
the test may be used to identify soldiers in this 
military occupational specialty who require re- 
fresher training before reassignment. Critical scores 
and interpretations are given in Manual TC-M12 
and supplements. 



Section IV. MISCELLANEOUS TECHNICAL 
SPECIALTIES 




133. Aircraft Warning Specialists 

a. Information Center Operator (MOS-510). 

b. Aircraft Observer, Ground (MOS-518). 

c. Radar Crewman (Designated Set) (M OS-51 4) 
— selection for training. (1) Aircraft Warning 
Aptitude Test (TC-lOa). 

(2) Aircraft Warning Classification Test 
(TC-lla). 

For information on the use of these tests in con- 
nection with the selection and classification of 
Aircraft Warning Specialists, see Manual, Classifi- 
cation of Aircraft Warning Trainees, TC-M2. 

134. Cook (MOS-060) 

Selection for Direct Assignment: 

a. The Cook Experience Check List (TC-27ar) 
is a means of obtaining an objective evaluation of a 
man’s experience and training as a cook. The 
number and types of items checked by the average 
graduate of a cooks’ course are indicated in Manual 
TC-M12 and supplements. 

b. The Cook Test (TC-28ar) is designed to 
measure technical knowledge of operations and 
equipment in the field of cooking. Scores on the 
test may be used to identify soldiers in this mili- 
tary occupational specialty who require refresher 
training before reassignment. Critical scores and 
interpretations are given in Manual TC-M12 and 
supplements. 

135. Cryptographic Technician (MOS— 805), 
Cryptographic Code Compiler (MOS- 
807), and Cryptanalysis Technician 
(M OS-808) 

Selection for Training: 



a. Army General Classification Test. • 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 
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b. Clerical Aptitude Test. 

SCORE 
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Chances m 100 that a man receiving one of the above scores will 
achieve average or better in training _ 



c. Cryptography Test (TC-4a) 



a. 

SCORE 


Mechanical Aptitude Test. 


140 


mmmmmmmmn 


120 


58 — 


E3 


HHHB36 


in 


■■18 




N Mean Score SD r 

690 113.0 15.7 .39 



Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 




Chances in 100 that a man receiving one of the above scores tuill 
achieve average or better in training. 



d. Other factors. In most nstances, trainees 
and men assigned to this work must be citizens and 
must be approved by a G-2 investigation. A back- 
ground of training or experience in mathematics or 
languages is generally considered desirable. In 
addition, the Cryptographic Technician must be 
able to type at the rate of 25 words per minute. 



136. Radio Operator, AAF (MOS-756); 

^ d j?r 0perator ' Mechanic ’ Gunner, AAF 
(MOS— 757); Radio Operator, Low 
Speed, AAF (MOS-776); and Radio 
Operator and Mechanic, AAF (MOS- 
2756) 

Selection for Training : 



b. Army Radio Code Aptitude Test. 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 



137. Weather Observer (MOS-784) 

Selection for Training: 



a. Army General Classification Test. 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better in training. 
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h. Weather Aptitude Test (TC-3a). 




c. Other factors. Training in physics (high 
school or college) is essential. A knowledge of 
algebra and prior training and/or experience in 
meteorology are valuable assets. 

138. Officer Candidate (MOS-625) 

Selection for Training: 



a. Army General Classification Test. 

SCORE 
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306 126.5 1 10.0 .38 



Chances in 100 that a man receiving one of the above scores will 
achieve average or belter academic grades in training. 



*A score of 110 on AGCT is required for selection as a 
candidate for officer training. 



b. Officer Candidate Test (OCT-1 and OCT 2) . 

SCORE 
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Chances in 100 that a man receiving one of the above scores will 
achieve average or better academic grades in training. 

c. Other factors. Next to academic proficiency, 
leadership is of paramount importance for candi- 
dates for officer training. Weakness in one or 
the other of these two areas accounts for 75 per- 
cent— 90 percent of all failures in Officer Candidate 
Schools. Until the qualities of leadership have 
been isolated and more objectively defined, no 
tests can be developed for estimating these quali- 
ties. Nevertheless, evaluations of leadership traits 
by OCS selection agencies can be rendered more 
standard and objective by a proper utilization of 
WD AGO Form HO ( Intervievnng and Rating of 
Applicants far Officer Candidate . School ). The 
principles of interviewing and rating outlined in 
Chapter 6 of this ' manual are applicable in this 
connection. 



Section V. OTHER TESTS 
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139. Individual Test of General Mental Ability 

a. The Army Individual Test is designed for 
use in connection with personnel problems in which 
it appears likely that an individual instrument 
will provide a better estimate of general learning 
ability than the usual group test. It measures the 
same general abilities as the Army General Classi- 
fication Test. In addition, since it is adminis- 
tered individually and involves a variety of verbal 
and performance materials, it provides the examiner 
an opportunity to control motivation more effec- 
tively and to observe and evaluate the examinee’s 
behavior more fully. 

b. The Individual Target Test is a means of 
determining aptitude for assimilating basic mili- 
tary training among men who are slow learners, 
illiterate or semi-illiterate or limited in verbal and 
numerical skills. Scores are interpreted in terms 
of probable soldier proficiency of the examinees. 
It is applicable only to men with limited general 
mental ability, and should be employed, ' in such 
cases, as an objective adjunct to other available 
means of estimating chances for success in basic 
military assignments. 

c. The Nonlanguage Test, 2abc is designed to 
grade Army personnel in terms of their ability to 
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learn Army duties. As the name implies, the test 
minimizes the use of language. It is composed of 
three subtests involving box counting, symbol 
association and substitution, and design com- 
parison. It is administered as a group test, all 
directions being given in pantomime. Scores on 
the test are distributed in much the same fashion 
as AGCT scores, and the relationship between 
the two tests is fairly close. The Nonlanguage 
Test, 2abc was formerly given in reception centers 
to all men who scored Grade V on the AGCT. 
Since such men are now assigned to Special Training 
Units, its use is now largely confined to training 
centers and units where it is occasionally employed 
as a check on low AGCT scores. 

140. Warrant Officer Examinations 

Applicants for appointment as warrant officers 
must pass written technical examinations in addi- 
tion to satisfying the other requirements of ex- 
perience and training. Separate examinations are 
available for each of the various classifications in 
wdiich warrant officers may be appointed. The 
duties involved in each classification and the scope 
of the appropriate examinations are outlined in 
AR 610-10 and 610-15. 
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Supply clerk test, (quartermaster sup- 
plies) (TC-30ar) 

Tank mechanic, minor maintenance 

(MOS-660) 

Technical trade test (TC-7a) - 

Testing situation, physical surroundings. 
Tests: 

Achievement 

Administration 

Aptitude 

Classification 

Comparison with other measures 

Construction 

Continuing Btudies - 

Critical scores on 

Directions for scoring - 



Paraffraph 

47 


Pape 


Tests — Continued : 


Paraffraph 

32 


Pace 

19 






34 


22 


51 

51 






23 


14 






23 


14 


57 

56 


38 


Induction station 


102 

25 


62 

16 






21 


13 


30 

30 


18 




23 


14 




32 


19 






26 


17 








no 


72 


9 

118 


76 


Reliability 


71,72 

30 


49 

18 








26 


17 








58 


39 


15 


11 


Standardization 

Standardization population - - - - - 


35 

36 
26 


22 

23 

17 








45 


28 








66,67 


46,47 


42 

35 

36 


25 

22 

23 




23 


14 


Tool usage film strip test (TC-1 2a) ------- 

Trade information test (TC—la) 

Truck driver experience check list (TC- 


126 

128 

125 


80 

81 

79 


100 

58 

123 


62 

39 

79 


Truek driver, heavy (M OS-931) 

Truck driver, light (MOS-345) 

Truck driver test (TC-22ar) - 


125 

125 

125 

122 


79 

79 

79 

79 


124 


79 




128 


81 


124 


79 


Use-of- -tools test (TC-16a) 

Validity: 


126 

67,73 


80 

47,50 


124 


79 




66 


46 








23 


14 


127 


81 




106 


66 


128 

44 


81 

27 


Warrant officer examinations ---. 

WD AGO Form 20 (Soldier’s Qualifi- 


140 

109 


87 

68 


22 
29 . 
22 
20 
79 
29 
38 


17 

13 

13 

63 

17 
23 
23 

18 


WD AGO Form 240 (Interviewing and 

Rating of Applicants for OCS) 

Weather aptitude test. (TC-3a) 

Weather observer (MOS-784) 

Welder, combination (M OS-256) 

Welding experience check list (TC-19ar! . 


138 
137 
_ 137 

132 , 
132 
. 132 


86 

85 

85 

84 

84 

84 


37 

30 


Wells’ concrete directions test - 


106 


66 






☆ u. S. GOVERNMENT PRINTING OFFICE: 1946— 


-692201 



90 




